How to Show that Your Model is Better: A Basic Guide to Statistical Hypothesis Testing

Do you need help determining which machine learning model is superior? This post presents a step-by-step guide using basic statistical techniques and a real case study! 🤖📈 #AIOrthoPraxy #MachineLearning #Statistics #DataScience

When employing Machine Learning to address problems, our choice of a model plays a crucial role. Evaluating models can be straightforward when performance disparities are substantial, for example, when comparing two large-language models (LLMS) on a masked language modeling (MLM) task with 71.01 and 28.56 perplexity, respectively. However, if differences among models are minute, making a solid analysis to discern if one model is genuinely superior to others can prove challenging.

This tutorial aims to present a step-by-step guide to determine if one model is superior to another. Our approach relies on basic statistical techniques and real datasets. Our study compares four models on six datasets using one metric, standard accuracy. Alternatively, other contexts may use different numbers of models, metrics, or datasets. We will work with the tables below that show the properties of the datasets and the performance of two baseline models and two of our proposed models, for which we hope to show that they are better, which would be our hypothesis to be tested.

Summary of performance measured with standard accuracy
Summary of the main properties of the datasets considered in this tutorial.

One of the primary purposes of statistics is hypothesis testing. Statistical inference involves taking a sample from a population and determining how well the sample represents the population. In hypothesis testing, we formulate a null hypothesis, H_0, and an alternative hypothesis, H_A, based on the problem (comparing models). Both hypotheses must be concise, mutually exclusive, and exhaustive. For example, we could say that our null hypothesis is that the models perform equally, and the alternative could mean that the models perform differently.

Why is the ANOVA test not a good alternative?

The ANOVA (Analysis of Variance) test is a parametric test that compares the means of multiple groups. In our case, we have four models to compare with six datasets. The null hypothesis for ANOVA is that all the means are equal, and the alternative hypothesis is that at least one of the means is different. If the p-value of the ANOVA test is less than the significance level (usually 0.05), we reject the null hypothesis and conclude that at least one of the means is different, i.e., at least one model performs differently than the others. However, ANOVA may not always be the best choice for comparing the performance of different models.

One reason for this is that ANOVA assumes that the data follows a normal distribution, which may not always be the case for real-world data. Additionally, ANOVA does not take into account the difficulty of classifying certain data points. For example, in a dataset with a single numerical feature and binary labels, all models may achieve 100% accuracy on the training data. However, if the test set contains some mislabeled points, the models may perform differently. In this scenario, ANOVA would not be appropriate because it does not account for the difficulty of classifying certain data points.

Another issue with ANOVA is that it assumes that the variances of the groups being compared are equal. This assumption may not hold for datasets with different levels of noise or variability. In such cases, alternative statistical tests like the Friedman test or the Nemenyi test may be more appropriate.

Friedman test

The Friedman test is a non-parametric test that compares multiple models. In our example, we want to compare the performance of k=4 different models, i.e., two baseline models, Gabor randomized, and Gabor repeated, on N=6 datasets. First, the test calculates the average rank of each model’s performance on each dataset, with the best-performing model receiving a rank of 1. The Friedman test then tests the null hypothesis, H_0, that all models are equally effective and their average ranks should be equal. The test statistic is calculated as follows:

(1)   \begin{equation*} \chi_{F}^{2}=\frac{12 N}{k(k+1)}\left[\sum_{j=1}^{k} R_{j}^{2}-\frac{k(k+1)^{2}}{4}\right] \end{equation*}

where R is the average ranking of each model.

The test result can be used to determine whether there is a statistically significant difference between the performance of the models by making sure that \chi_{F}^{2} is not less than the critical value for the F distribution for a particular confidence value \alpha. However, since \chi_{F}^{2} could be too conservative, we also calculate the F_F statistic as follows:

(2)   \begin{equation*} F_{F}=\frac{(N-1) \chi_{F}^{2}}{N(k-1)-\chi_{F}^{2}}. \end{equation*}

Based on the critical value, F_{F}, and \chi_{F}^{2}, we evaluate H_0; once the null hypothesis is rejected, we apply a posthoc test. For this, we use the Nemenyi test to establish whether models differ significantly in their performance.

We will start the process of getting this test done by ranking the data. First, we can load the data and verify it with respect to the table shown earlier.

import pandas as pd
import numpy as np

data = [[0.8937, 0.8839, 0.9072, 0.9102],
        [0.8023, 0.8024, 0.8229, 0.8238],
        [0.7130, 0.7132, 0.7198, 0.7206],
        [0.5084, 0.5085, 0.5232, 0.5273],
        [0.2331, 0.2326, 0.3620, 0.3952],
        [0.5174, 0.5175, 0.5307, 0.5178]]

model_names = ['Glorot N.', 'Glorot U.', 'Random G.', 'Repeated G.']

df = pd.DataFrame(data, columns=model_names)

print(df.describe())  #<- use averages to verify if matches table


       Glorot N.  Glorot U.  Random G.  Repeated G.
count   6.000000   6.000000   6.000000     6.000000
mean    0.611317   0.609683   0.644300     0.649150
std     0.240422   0.238318   0.206871     0.200173
min     0.233100   0.232600   0.362000     0.395200
25%     0.510650   0.510750   0.525075     0.520175
50%     0.615200   0.615350   0.625250     0.623950
75%     0.779975   0.780100   0.797125     0.798000
max     0.893700   0.883900   0.907200     0.910200

Next, we rank the models and get their averages like so:

data = df.rank(1, method='average', ascending=False)


   Glorot N.  Glorot U.  Random G.  Repeated G.
0        3.0        4.0        2.0          1.0
1        4.0        3.0        2.0          1.0
2        4.0        3.0        2.0          1.0
3        4.0        3.0        2.0          1.0
4        3.0        4.0        2.0          1.0
5        4.0        3.0        1.0          2.0

       Glorot N.  Glorot U.  Random G.  Repeated G.
count   6.000000   6.000000   6.000000     6.000000
mean    3.666667   3.333333   1.833333     1.166667
std     0.516398   0.516398   0.408248     0.408248
min     3.000000   3.000000   1.000000     1.000000
25%     3.250000   3.000000   2.000000     1.000000
50%     4.000000   3.000000   2.000000     1.000000
75%     4.000000   3.750000   2.000000     1.000000
max     4.000000   4.000000   2.000000     2.000000

With this information, we can expand our initial results table to show the rankings by dataset and the average rankings across all datasets for each model.

Now that we have the rankings, we can proceed with the statistical analysis and do the following:

(3)   \begin{align*} \chi_{F}^{2}&=\frac{12 \cdot 6}{4 \cdot 5}\left[\left(3.66^2+3.33^2+1.83^2+1.16^2\right)-\frac{4 \cdot 5^2}{4}\right] \nonumber \\ &=15.364 \nonumber  \end{align*}

(4)   \begin{equation*} F_{F}=\frac{5 \cdot 15.364}{6 \cdot 3-15.364}=29.143 \nonumber \end{equation*}

The critical value at \alpha=0.01 is 5.417. Thus, because the critical value is below our statistics obtained, we reject H_0 with 99% confidence.

The critical value can be obtained from any table that has the F distribution. In the table the degrees of freedom across columns (denoted as df_1) is k-1, that is the number of models minus one; the degrees of freedom across rows (denoted as df_2) is (k-1)\times(N-1), that is, the number of models minus one, times the number of datasets minus one. In our case this is df_1=3 and df_2=15.

Nemenyi Test

The Nemenyi test is a post-hoc test that compares multiple models after a significant result from Friedman’s test. The null hypothesis for Nemenyi is that there is no difference between any two models, and the alternative hypothesis is that at least one pair of models is different.

The formula for Nemenyi is as follows:

    \[CD = q_{\alpha} \sqrt{\frac{k(k+1)}{6N}}\]

where q_{\alpha} is the critical difference of the Studentized range distribution at the chosen significance level and k is the number of groups. The q_{\alpha} value can be obtained from the following table:

Critical values for the Nemenyi test, which is conducted following the Friedman test, with two-tailed results.

Thus, for our particular case study, the critical differences are:

(5)   \begin{equation*} CD_{\alpha=0.05}=2.569 \sqrt{\frac{4 \cdot 5}{6 \cdot 6}} = 1.915 \nonumber \end{equation*}

(6)   \begin{equation*} CD_{\alpha=0.10}=2.291 \sqrt{\frac{4 \cdot 5}{6 \cdot 6}} = 1.708 \nonumber \end{equation*}

Since the difference in rank between the randomized Gabor and baseline Glorot normal is 1.83 and is less than the CD_{\alpha=0.10}=1.708, we conclude Gabor is better. Similarly, since the difference in rank between the fixed Gabor and baseline Glorot uniform is 2.17 and is less than the CD_{\alpha=0.05}=1.915, we conclude that Gabor is better. Yes, there is sufficient statistical evidence to show that our model is better with high confidence.

Things we would like to see in papers

First of all, it would be nice to have a complete table that includes the results of the statistical tests as part of the caption or as a footnote, like this:

Second of all, graphics always help! A simple and visually appealing diagram is a powerful way to represent post hoc test results when comparing multiple classifiers. The figure below, which illustrates the data analysis from the table above, displays the average ranks of methods along the top line of the diagram. To facilitate interpretation, the axis is oriented so that the best ranks appear on the right side, which enables us to perceive the methods on the right as superior.

Comparison of all models against each other with the Nemenyi test. Models not significantly different at α = 0.10 or α = 0.05 are connected.

When comparing all the algorithms against each other, the groups of algorithms that are not significantly different are connected with a bold solid line. Such an approach clearly highlights the most effective models while also providing a robust analysis of the differences between models. Additionally, the critical difference is shown above the graph, further enhancing the visualization of the analysis results. Overall, this simple yet powerful diagrammatic approach provides a clear and concise representation of the performance of multiple classifiers, enabling more informed decision-making in selecting the best-performing model.

Main Sources

The statistical tests are based on this paper:

Demšar, Janez. “Statistical comparisons of classifiers over multiple data sets.” The Journal of Machine learning research 7 (2006): 1-30.

The case study is based on the following research:

Rai, Mehang. “On the Performance of Convolutional Neural Networks Initialized with Gabor Filters.” Thesis, Baylor University, 2021.

President’s Executive Order for Advancing Racial Equity in AI Systems: What It Means for the Future of AI-Based Technology

Summary: The President of the United States, Joe Biden, has recently authorized an Executive Order intending to enhance racial equity and foster support for marginalized communities via the federal government. The Order mandates that federal agencies employing artificial intelligence (AI) systems assume novel equity responsibilities and instructs them to forestall and rectify any form of discrimination, including safeguarding the public from the perils of algorithmic discrimination.

What you should know: The recent Executive Order on Further Advancing Racial Equity and Support for Underserved Communities Through The Federal Government emphasizes the importance of advancing equity for all, including communities that have long been underserved, and addressing systemic racism in the US policies and programs. This order implies that AI systems should be designed to ensure that they do not perpetuate or exacerbate inequities and should be used to address the unfair disparities faced by underserved communities. It is also implied that the Federal Government should work with civil society, the private sector, and State and local governments to redress unfair disparities and remove barriers to Government programs and services, which could be facilitated by the development and deployment of ethical and responsible AI systems. Additionally, the order emphasizes the need for evidence-based approaches to equitable policymaking and implementation, which can be achieved through collecting and analyzing data on the impacts of AI systems on different communities. Therefore, AI practitioners should ensure that their systems are designed, developed, and deployed to promote equity, fairness, and inclusivity and are aligned with the Federal Government’s commitment to advancing racial equity and supporting underserved communities.

The Center for Standards and Ethics in Artificial Intelligence (CSEAI)

Following President’s Executive Order, we at the CSEAI recognize the critical role of artificial intelligence in promoting fairness, accountability, and transparency. As a research center committed to developing responsible AI techniques, we believe our work can help meet the challenges and opportunities of emerging regulation, standardization, and best practices in AI systems. We are inviting industry members to partner with us financially and take part in collaborative research on trustworthy AI. Our mission is to provide applicable, actionable, standard practices in trustworthy AI and train a workforce that enables fairness, accountability, and transparency. We believe our work will help mitigate AI adoption’s operational, liability, and reputation risks.

The CSEAI brings together leading universities to conduct collaborative research in responsible AI techniques. We are committed to workforce development and providing accessible standards, best practices, testing, and compliance. We are proud to be a part of the NSF IUCRC Program and are excited to be supported by the NSF, which provides a standard agreement, organizational, and legal framework.

Join us in creating a better future for all Americans by developing responsible AI practices that promote fairness, accountability, and transparency. By partnering with the CSEAI, you will have the opportunity to work with a dedicated team of researchers, participate in cutting-edge research, and help shape the future of AI. Contact us today to learn more about partnering with the CSEAI.

Contact and find out more at

International Conference on Emergent and Quantum Technologies (ICEQT’23)

July 24-27, 2023 — Las Vegas, NV

Dear Esteemed Colleagues,

Quantum computing is a rapidly emerging interdisciplinary research area that integrates concepts from mathematics, physics, and engineering. For scientific rigor and successful progress in the field, it demands contributions from various STEM areas.

In this context, we are pleased to announce the International Conference on Emergent and Quantum Technologies (ICEQT’23), to be held on July 24-27, 2023, in Las Vegas, NV. The conference aims to provide an opportunity for researchers in the field of quantum machine learning and machine learning researchers interested in applying AI to enhance quantum computing algorithms, to present and discuss recent advancements in their areas of expertise.

Notably, there has been an increasing interest from machine learning researchers to apply AI to the quantum computing domain, and vice versa. As a result, we cordially invite submissions of original research papers that present state-of-the-art contributions in the following areas:

Foundations of Quantum Computing and Quantum Machine Learning

  • Quantum computing models and paradigms, e.g., Grover, Shor, and others
  • Quantum algorithms for Linear Systems of Equations
  • Quantum Tensor Networks and their Applications in QML

Quantum Machine Learning Algorithms

  • Quantum Neural Networks
  • Quantum Hidden Markov Models
  • Quantum PCA
  • Quantum SVM
  • Quantum Autoencoders
  • Quantum Transfer Learning
  • Quantum Boltzmann machines
  • Theory of Quantum-enhanced Machine Learning

AI for Quantum Computing

  • Machine learning for improved quantum algorithm performance
  • Machine learning for quantum control
  • Machine learning for building better quantum hardware

Quantum Algorithms and Applications

  • Quantum computing: models and paradigms
  • Quantum algorithms for hyperparameter tuning (Quantum computing for AutoML)
  • Quantum-enhanced Reinforcement Learning
  • Quantum Annealing
  • Quantum Sampling
  • Applications of Quantum Machine Learning

Fairness and Ethics in Quantum Machine Learning

We look forward to receiving your submissions and to welcoming you to ICEQT’23.

All submissions that are accepted for presentation will be included in the proceedings published by IEEE CPS. To ensure consistency in formatting, authors should follow the general typesetting instructions available on the IEEE’s website, including single-line spacing and a 2-column format. Additionally, authors of accepted papers must agree to the IEEE CPS standard statement regarding copyrights and policies on electronic dissemination.

Prospective authors are encouraged to submit their papers through the conference’s evaluation website at More information about the conference, including submission guidelines, can be found on our website at

Important Deadlines

April 12, 2023: Submission of papers:
– Full/Regular Research Papers (maximum of 8 pages)
– Short Research Papers (maximum of 5 pages)
– Abstract/Poster Papers (maximum of 3 pages)

May 1, 2023: Notification of acceptance (+/- two days)

May 16, 2023: Final papers + Registration

June 21, 2023: Last day for hotel room reservation at a discounted price.

July 24-27, 2023: The 2023 World Congress in Computer Science, Computer Engineering, and Applied Computing (CSCE’23: USA)
Which includes the International Conference on Emergent and Quantum Technologies (ICEQT’23)

Dr. Pablo Rivas, Baylor University
Dr. Javier Orduz, Earlham College

Diving Into Large Language Models: An Exploration of ChatGPT and Its Alternatives

An abstract illustration that depicts a central hub or nucleus from which lines and arrows radiate outwards to represent the different layers.

Large Language Models (LLMs) have become a hot topic in the world of machine learning, with chatbots like ChatGPT and other models gaining widespread popularity. However, keeping up with the latest research and advancements in this rapidly evolving field can be challenging. To help you catch up, we’ve compiled a list of 11 essential research papers that every LLM enthusiast should read. From the original Transformer architecture to recent innovations in efficiency and alignment, these papers will give you a comprehensive understanding of the field and help you stay ahead of the curve. So whether you’re a seasoned LLM practitioner or just getting started, read on to discover the key papers that will take your understanding of this exciting field to the next level.

Foundational Papers on LLM Architecture and Pretraining:

  • “Attention is All You Need” by Vaswani et al.: This paper introduces the Transformer architecture, which uses scaled dot-product attention to process sequences of tokens. It has since become the basis for many state-of-the-art LLMs. (
  • “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” by Devlin et al.: This paper describes BERT, a powerful LLM that uses masked language modeling to pre-train a bidirectional Transformer encoder. BERT has achieved impressive results on various natural language processing tasks. (
  • “Improving Language Understanding by Generative Pre-Training” by Radford et al.: This paper introduces GPT, an LLM that uses a Transformer decoder to generate text based on a given prompt. It was one of the first models to demonstrate the effectiveness of large-scale unsupervised pretraining. (
  • “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” by Lewis et al.: BART is an LLM that combines elements of both encoder and decoder architectures and can be fine-tuned for a variety of natural language tasks. (

Methods for Improving LLM Efficiency:

  • “FlashAttention: A Scalable Framework for Efficient Attention Mechanisms” by Yang et al.: This paper proposes FlashAttention, a more efficient attention mechanism that reduces memory consumption and computational complexity in LLMs. (
  • “Cramming: Efficient Training of Large-Scale Models without Layerwise Pretraining” by Li et al.: This paper introduces a novel training method for LLMs that enables them to be trained on a single GPU without the need for layerwise pretraining. (

Methods for Controlling LLM Outputs:

  • “InstructGPT: Controllable Text Generation with Content-Planning Transformer” by Xiong et al.: InstructGPT is an LLM that allows for more precise control over the generated text by incorporating a content-planning module into the Transformer decoder. (
  • “Constitutional AI: Aligning Language Models with Human Values” by Amodei et al.: This paper proposes a framework for aligning LLMs with human values and provides an example of how it can be used to prevent the generation of harmful text. (

Alternative (ChatGPT) LLM Architectures:

  • “BLOOM: A Distributed Open-Source Implementation of LLMs” by Nadkarni et al.: BLOOM is an open-source implementation of LLMs that enables distributed training across multiple machines. (
  • “Sparrow: A Large-Scale Language Model for Conversational AI” by Li et al.: Sparrow is an LLM developed by DeepMind for conversational AI and features a unique architecture that enables more efficient and accurate text generation. (
  • “BlenderBot 3: Recipes for Building Large-Scale Conversational Agents” by Roller et al.: BlenderBot 3 is an LLM developed by Facebook Meta for conversational AI and includes the ability to search the internet for information to incorporate into its responses. (

Important Ethical Concerns Regarding LLMs:

  • “On the Opportunities and Risks of Foundation Models” by Rishi Bommasani et al. This paper discusses the opportunities and risks associated with “foundation models,” a new class of machine learning models trained on large and diverse datasets. The paper highlights the technical, social, and ethical challenges of deploying foundation models in various domains. (
  • “GPT-3: Its Nature, Scope, Limits, and Consequences” by Luciano Floridi & Massimo Chiriatti. This paper examines the capabilities and limitations of GPT-3, a state-of-the-art language model, and argues that it is not designed to pass tests of mathematical, semantic, or ethical questions. The paper concludes that GPT-3 is not the beginning of a general form of artificial intelligence. (
  • “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” by Emily M. Bender et al. This paper raises concerns about the risks associated with LLMs like GPT-3, including their environmental and financial costs, and recommends strategies for mitigating those risks. (

Before you go ahead and start reading these papers, remember that LLMs such as ChatGPT and its alternatives have revolutionized NLP and hold immense potential for a wide range of applications. However, we must also be mindful of the ethical concerns surrounding these models, such as potential biases and risks of misuse. As the field continues to evolve, we must prioritize ethical considerations and work towards developing models that align with human values and promote the greater good. With the right approach, large language models can enable us to build a more inclusive and equitable future where AI and human collaboration can drive innovation and positive change.

Students on a Mission: Tackling Online Criminal Activity Under NSF’s REU Program

From left to right: Austin, Patrick, Mia, Misty, Garrett, and Andrew.

Baylor.AI lab is thrilled to announce the launch of a new research stage in our project that uses NLP to identify suspicious transactions in omnichannel online C2C marketplaces. Under the guidance of Dr. Pablo Rivas and Dr. Tomas Cerny, two groups of undergraduate students participating in NSF’s REU program will contribute to this exciting project.

Misty and Andrew, under the direction of Dr. Pablo Rivas, will be working on designing data collection strategies for the project. Their goal is to gather relevant information to support the research and ensure the accuracy of the findings.

Meanwhile, under Dr. Tomas Cerny’s direction, Patrick, Mia, Austin, and Garrett will focus on data visualization and large graph understanding. Their role is crucial in helping to understand and interpret the data collected so far.

The research project investigates the feasibility of automating the detection of illegal goods or services within online marketplaces. As more people turn to online marketplaces for buying and selling goods and services, it is becoming increasingly important to ensure the safety of these transactions.

The project will first analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. The findings will then be adapted to a specific context – locating stolen motor vehicle parts advertised via online marketplaces – to determine general ways to identify signals of illegal online sales.

The project brings together the expertise of computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features and policies that make platforms more vulnerable to criminal activity. Using this information, the researchers will generate and train deep learning-based language models to detect illicit online commerce. The models will then be applied to markets for motor vehicle parts to assess their effectiveness.

This research project represents a significant step forward in the fight against illegal activities within online marketplaces. The project results will provide law enforcement agencies and online marketplaces with valuable insights and evidence to help them crack down on illicit goods or services sold on their platforms.

We are incredibly excited to see what Misty, Andrew, Patrick, Mia, Austin, and Garret will accomplish through this project. We can’t wait to see their impact on online criminal activity research. Stay tuned for updates on their progress and more information about this cutting-edge project.

Sic’em, Bears!

Supercomputing Leverages Quantum Machine Learning and Grover’s Algorithm

Khanal, B., Orduz, J., Rivas, P. et al. Supercomputing leverages quantum machine learning and Grover’s algorithm. J Supercomput 79, 6918–6940 (2023). [ bib |  .pdf ]

Quantum computing, a field that has drawn significant attention recently, promises to revolutionize how we approach computational problems. In Bikram Khanal’s paper titled “Supercomputing leverages quantum machine learning and Grover’s algorithm,” we delve deep into the intricacies of quantum computing, with a particular focus on Grover’s algorithm and its potential applications in quantum machine learning. This journal article is an extended version of an earlier conference paper found here.

Understanding Quantum Computing and Grover’s Algorithm

At the heart of our paper is a comprehensive discussion of the basics of quantum computing. We shed light on Grover’s quantum algorithm, which promises faster search capabilities compared to classical algorithms. Our team conducted an experiment simulating classical logical circulation by exploiting the power of amplitude amplification, a core principle behind Grover’s algorithm.

The Quantum Advantage

One of the primary takeaways from our research is the potential advantage quantum computing and quantum machine learning can offer over classical counterparts. By harnessing the efficiency of quantum computers, we can significantly reduce the reliance on supercomputing power for executing complex programs. This is particularly relevant for problems that pose challenges for classical computing methods.

Exploring Quantum Machine Learning

Our paper also delves into two promising approaches in quantum machine learning:

  1. Variational Quantum Circuits: These are quantum circuits that can be tuned variably, allowing for optimization in quantum computations.
  2. Kernel-based Quantum Machine Learning: This approach leverages the concept of kernels, which are used in classical machine learning for various tasks, and adapts them for quantum computations.

We believe the intersection of quantum algorithms and machine learning is ripe for exploration. While there’s a lot of potential, it’s also a domain that requires rigorous research to unearth solutions that can genuinely outperform classical machine learning methods. Kernel-based approaches, in particular, hold significant promise in the near future of quantum machine learning and supercomputing.

Looking Ahead

Our journey into the world of quantum computing doesn’t end here. We have charted out an exciting roadmap for our future research:

  • Real-world Data Testing: We aim to test our algorithm using actual data from classical circuits. This will involve parsing the circuit and feeding it into our algorithm.
  • Optimizing Computational Complexity: Our goal is to enhance the efficiency of our approach, especially when compared to classical methods.
  • Leveraging Grover’s Algorithm for Machine Learning: We believe many classification problems in machine learning can be reframed as search problems, making them ideal candidates for Grover’s algorithm.
  • Kernel Methods and Grover’s Algorithm: In our future work, we plan to reformulate machine learning problems using kernel methods, aiming to solve them efficiently as search problems using Grover’s algorithm.

Finally, we are at the cusp of some groundbreaking discoveries that can redefine the supercomputing landscape. We invite readers and fellow researchers to join us on this exciting journey and keep a close watch on the advancements in this domain. Read the paper here [ bib |  .pdf ].

Quantum circuit for Grover algorithm with user-defined oracle on clauses |q0⟩ AND |q3⟩ , |q1⟩ XOR |q2⟩, and |q2⟩ AND |q3⟩. Note that we need seven qubits in total. For c clauses, three in this case, we require c additional qubits and one output qubit, resulting (n + c + 1) qubits quantum circuit.

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust.

Fortunately, a couple of students in our lab, Korn Sooksatra and Bikram Khanal, noticed that the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, they transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, they convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, PhD candidates, Sooksatra and Khanal, create a framework that measures the robustness of a text classifier by using the gradients of the classifier.

This paper was accepted for presentation and publication at the LXAI workshop at NeurIPS 2022 in New Orleans, LA. Download the paper here: [ bib |  .pdf ]

(Editorial) Emerging Technologies, Evolving Threats: Next-Generation Security Challenges

The Volume 3, Issue 3, of the IEEE Transactions on Technology and Society is officially published with great contributions regarding security challenges posed by emerging technologies and their effects on society.

Our editorial piece is freely accessible and briefly introduces the research on this issue and how relevant these issues are. Our discussion briefly discusses GPT and DALEE, as means to show the great advances of AI and some ethical considerations around those. Take a look:

T. Bonaci, K. Michael, P. Rivas, L. J. Robertson and M. Zimmer, “Emerging Technologies, Evolving Threats: Next-Generation Security Challenges,” in IEEE Transactions on Technology and Society, vol. 3, no. 3, pp. 155-162, Sept. 2022, doi: 10.1109/TTS.2022.3202323.

NSF Award: Using NLP to Identify Suspicious Transactions in Omnichannel Online C2C Marketplaces

Baylor University has been awarded funding under the SaTC program for Enabling Interdisciplinary Collaboration; a grant led by Principal Investigator Dr. Pablo Rivas and an amazing group of multidisciplinary researchers formed by:

  • Dr. Gissella Bichler from California State University San Bernardino, Center for Criminal Justice Research, School of Criminology and Criminal Justice.
  • Dr. Tomas Cerny is at Baylor University in the Computer Science Department, leading software engineering research.
  • Dr. Laurie Giddens from the University of North Texas, a faculty member at the G. Brint Ryan College of Business.
  • Dr. Stacy Petter is at Wake Forest University in the School of Business. She and Dr. Giddens have extensive research and funding in human trafficking research.
  • Dr. Javier Turek, a Research Scientist in Machine Learning at Intel Labs, is our collaborator in matters related to machine learning for natural language processing.

We also have two Ph.D. students working on this project: Alejandro Rodriguez and Korn Sooksatra.

This project was motivated by the increasing pattern of people buying and selling goods and services directly from other people via online marketplaces. While many online marketplaces enable transactions among reputable buyers and sellers, some platforms are vulnerable to suspicious transactions. This project investigates whether it is possible to automate the detection of illegal goods or services within online marketplaces. First, the project team will analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. Then, the team will adapt the findings to a specific context to locate stolen motor vehicle parts advertised via online marketplaces. Together, the work will lead to general ways to identify signals of illegal online sales that can be used to help people choose trustworthy marketplaces and avoid illicit actors. This project will also provide law enforcement agencies and online marketplaces with insights to gather evidence on illicit goods or services on those marketplaces.

This research assesses the feasibility of modeling illegal activity in online consumer-to-consumer (C2C) platforms, using platform characteristics, seller profiles, and advertisements to prioritize investigations using actionable intelligence extracted from open-source information. The project is organized around three main steps. First, the research team will combine knowledge from computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features, policies, and terms of service that make platforms more vulnerable to criminal activity. Second, building on the understanding of platform vulnerabilities developed in the first step, the researchers will generate and train deep learning-based language models to detect illicit online commerce. Finally, to assess the generalizability of the identified markers, the investigators will apply the models to markets for motor vehicle parts, a licit marketplace that sometimes includes sellers offering stolen goods. This project establishes a cross-disciplinary partnership among a diverse group of researchers from different institutions and academic disciplines with collaborators from law enforcement and industry to develop practical, actionable insights.

Self-supervised modeling. After providing a corpus associated with a C2C domain of interest and ontologies, we will extract features followed by attention mechanisms for self-supervised and supervised tasks. The self-supervised models include the completion of missing information and domain-specific text encoding for learning representations. Then supervised tasks will leverage these representations to learn the relationships with targets.

Dr. Bejarano’s work is Recognized by Amazon

According to the World Federation of the Deaf, more than 70 million deaf people exist worldwide. More than 80% of them live in developing countries. Recent research by Dr. Gissella Bejarano, our very own postdoctoral research scientist, has been recognized for its impact on computer vision and speech recognition, providing opportunities to help individuals with disabilities. With support from AWS, Dr. Bejarano is finding better ways to translate Peruvian Sign Language using computer vision and natural language processing.

Read more about this in this release by AWS.