Students on a Mission: Tackling Online Criminal Activity Under NSF’s REU Program

From left to right: Austin, Patrick, Mia, Misty, Garrett, and Andrew.

Baylor.AI lab is thrilled to announce the launch of a new research stage in our project that uses NLP to identify suspicious transactions in omnichannel online C2C marketplaces. Under the guidance of Dr. Pablo Rivas and Dr. Tomas Cerny, two groups of undergraduate students participating in NSF’s REU program will contribute to this exciting project.

Misty and Andrew, under the direction of Dr. Pablo Rivas, will be working on designing data collection strategies for the project. Their goal is to gather relevant information to support the research and ensure the accuracy of the findings.

Meanwhile, under Dr. Tomas Cerny’s direction, Patrick, Mia, Austin, and Garrett will focus on data visualization and large graph understanding. Their role is crucial in helping to understand and interpret the data collected so far.

The research project investigates the feasibility of automating the detection of illegal goods or services within online marketplaces. As more people turn to online marketplaces for buying and selling goods and services, it is becoming increasingly important to ensure the safety of these transactions.

The project will first analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. The findings will then be adapted to a specific context – locating stolen motor vehicle parts advertised via online marketplaces – to determine general ways to identify signals of illegal online sales.

The project brings together the expertise of computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features and policies that make platforms more vulnerable to criminal activity. Using this information, the researchers will generate and train deep learning-based language models to detect illicit online commerce. The models will then be applied to markets for motor vehicle parts to assess their effectiveness.

This research project represents a significant step forward in the fight against illegal activities within online marketplaces. The project results will provide law enforcement agencies and online marketplaces with valuable insights and evidence to help them crack down on illicit goods or services sold on their platforms.

We are incredibly excited to see what Misty, Andrew, Patrick, Mia, Austin, and Garret will accomplish through this project. We can’t wait to see their impact on online criminal activity research. Stay tuned for updates on their progress and more information about this cutting-edge project.

Sic’em, Bears!

Power of Data In Quantum Machine Learning

This week at the lab, we read the following paper, and here is our summary:

Huang, Hsin-Yuan, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, and Jarrod R. McClean. “Power of data in quantum machine learning.” Nature communications 12, no. 1 (2021): 2631.

Summary

This work focuses on the advancement of quantum technologies and their impact on machine learning. The two paths towards the quantum enhancement of machine learning include using the power of quantum computing to improve the training process of existing classical models and using quantum models to generate correlations between variables that are inefficient to represent through classical computation. The authors show that this picture is incomplete in machine learning problems where some training data are provided, as the provided data can elevate classical models to rival quantum models. The authors present a flowchart for testing potential quantum prediction advantage based on prediction error bounds for training classical and quantum ML methods based on kernel functions. This elevation of classical models through some training samples is illustrative of the power of data. The authors also show that, “training a specific classical ML model on a collection of N training examples (\mathbf{x}, y = f(\mathbf{x})) would give rise to a prediction model h(\mathbf{x}) with

(1)   \begin{equation*} \mathbb{E}_\mathbf{x}|h(\mathbf{x})-f(\mathbf{x})|\leq c \sqrt{p^2/N} \end{equation*}

for a constant c > 0. Hence, with N \approx p^2/\epsilon^2 training data, one can train a classical ML model to predict the function f(\mathbf{x}) up to an additive prediction error \epsilon.” They also show that a slight geometric difference between kernel functions defined by classical and quantum ML guarantees similar or better performance in prediction by classical ML. On the other hand, a sizeable geometric difference indicates the possibility of a large prediction advantage using the quantum ML model.

Additionally, the authors introduced ”projected quantum kernels” and demonstrated, through empirical results, that these outperformed all tested classical models in prediction error. This work provides a guidebook for generating ML problems that showcase the separation between quantum and classical models.

Intellectual Merit

This work provides a theoretical and computational framework for comparing classical and quantum ML models. The authors develop prediction error bounds for training classical and quantum ML methods based on kernel functions, which provide provable guarantees and are very flexible in the functions they can learn. The authors also develop a flowchart for testing potential quantum prediction advantage, a function-independent prescreening that allows one to evaluate the possibility of better performance. The authors provide a constructive example of a discrete log feature map, which gives a provable separation for their kernel. They rule out many existing models in the literature, providing a powerful sieve for focusing the development of new data encodings.

Broader Impact

The authors’ contributions to the field of quantum technologies and machine learning have significant broader impacts. The development of a flowchart for testing potential quantum prediction advantage provides a tool for researchers and practitioners to determine the possibility of better performance using quantum ML models. The authors’ framework can also be used to compare and construct hard classical models, such as hash functions, which have applications in cryptography and secure communication. The authors’ work has the potential to accelerate the development of new data encodings, leading to more efficient and accurate machine learning models. This has far-reaching implications for various applications, including image recognition, text translation, and even physics applications, where machine learning can revolutionize how we analyze and interpret data. The paper was organized and written by collaborating with three famous quantum institutes: Google Quantum AI, the Institute for Quantum Information and Matter at Caltech, and the Department of Computing and Mathematical Sciences at Caltech.

NSF Award: Using NLP to Identify Suspicious Transactions in Omnichannel Online C2C Marketplaces

Baylor University has been awarded funding under the SaTC program for Enabling Interdisciplinary Collaboration; a grant led by Principal Investigator Dr. Pablo Rivas and an amazing group of multidisciplinary researchers formed by:

  • Dr. Gissella Bichler from California State University San Bernardino, Center for Criminal Justice Research, School of Criminology and Criminal Justice.
  • Dr. Tomas Cerny is at Baylor University in the Computer Science Department, leading software engineering research.
  • Dr. Laurie Giddens from the University of North Texas, a faculty member at the G. Brint Ryan College of Business.
  • Dr. Stacy Petter is at Wake Forest University in the School of Business. She and Dr. Giddens have extensive research and funding in human trafficking research.
  • Dr. Javier Turek, a Research Scientist in Machine Learning at Intel Labs, is our collaborator in matters related to machine learning for natural language processing.

We also have two Ph.D. students working on this project: Alejandro Rodriguez and Korn Sooksatra.

This project was motivated by the increasing pattern of people buying and selling goods and services directly from other people via online marketplaces. While many online marketplaces enable transactions among reputable buyers and sellers, some platforms are vulnerable to suspicious transactions. This project investigates whether it is possible to automate the detection of illegal goods or services within online marketplaces. First, the project team will analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. Then, the team will adapt the findings to a specific context to locate stolen motor vehicle parts advertised via online marketplaces. Together, the work will lead to general ways to identify signals of illegal online sales that can be used to help people choose trustworthy marketplaces and avoid illicit actors. This project will also provide law enforcement agencies and online marketplaces with insights to gather evidence on illicit goods or services on those marketplaces.

This research assesses the feasibility of modeling illegal activity in online consumer-to-consumer (C2C) platforms, using platform characteristics, seller profiles, and advertisements to prioritize investigations using actionable intelligence extracted from open-source information. The project is organized around three main steps. First, the research team will combine knowledge from computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features, policies, and terms of service that make platforms more vulnerable to criminal activity. Second, building on the understanding of platform vulnerabilities developed in the first step, the researchers will generate and train deep learning-based language models to detect illicit online commerce. Finally, to assess the generalizability of the identified markers, the investigators will apply the models to markets for motor vehicle parts, a licit marketplace that sometimes includes sellers offering stolen goods. This project establishes a cross-disciplinary partnership among a diverse group of researchers from different institutions and academic disciplines with collaborators from law enforcement and industry to develop practical, actionable insights.

Self-supervised modeling. After providing a corpus associated with a C2C domain of interest and ontologies, we will extract features followed by attention mechanisms for self-supervised and supervised tasks. The self-supervised models include the completion of missing information and domain-specific text encoding for learning representations. Then supervised tasks will leverage these representations to learn the relationships with targets.

Dr. Bejarano’s work is Recognized by Amazon

According to the World Federation of the Deaf, more than 70 million deaf people exist worldwide. More than 80% of them live in developing countries. Recent research by Dr. Gissella Bejarano, our very own postdoctoral research scientist, has been recognized for its impact on computer vision and speech recognition, providing opportunities to help individuals with disabilities. With support from AWS, Dr. Bejarano is finding better ways to translate Peruvian Sign Language using computer vision and natural language processing.

Read more about this in this release by AWS.

How High can you Fly? LCC Passenger Dissatisfaction

This study provides a better understanding of the LCC passenger dissatisfaction phenomenon, as we now have an idea of which themes are important and require urgent attention. The findings show that over 10 years, the LCC passenger dissatisfaction criteria evolved, meaning that LCCs should be strongly aware of areas of concern in order to maintain passenger satisfaction. Based on classic data analytics, four themes – flight delay, ground staff attitude, luggage handling, and seat comfort – were identified as playing a crucial role in passenger dissatisfaction. Interestingly, LCC passengers were not found to have a problem with cabin crew attitude. Two possible reasons for the major themes of ground staff dissatisfaction may simply be that LCC ground staff lack training and that passengers expect ground staff to have the authority to make decisions and to be aware of passengers’ needs. Overall, when ground staff is not able to deal with passengers’ demands, passengers feel dissatisfied. In addition, the study found that the check-in counter, food, airline ground announcements, airline responses, cleanliness and additional/personal costs are secondary themes in passenger dissatisfaction. This study, therefore, clearly shows that LCCs should prioritize their efforts to minimize passenger dissatisfaction by firstly dealing with the primary themes of passenger dissatisfaction. [pdf, bib]

The Final Themes of LCC Passenger Dissatisfaction