Diving Into Large Language Models: An Exploration of ChatGPT and Its Alternatives

An abstract illustration that depicts a central hub or nucleus from which lines and arrows radiate outwards to represent the different layers.

Large Language Models (LLMs) have become a hot topic in the world of machine learning, with chatbots like ChatGPT and other models gaining widespread popularity. However, keeping up with the latest research and advancements in this rapidly evolving field can be challenging. To help you catch up, we’ve compiled a list of 11 essential research papers that every LLM enthusiast should read. From the original Transformer architecture to recent innovations in efficiency and alignment, these papers will give you a comprehensive understanding of the field and help you stay ahead of the curve. So whether you’re a seasoned LLM practitioner or just getting started, read on to discover the key papers that will take your understanding of this exciting field to the next level.

Foundational Papers on LLM Architecture and Pretraining:

  • “Attention is All You Need” by Vaswani et al.: This paper introduces the Transformer architecture, which uses scaled dot-product attention to process sequences of tokens. It has since become the basis for many state-of-the-art LLMs. (https://arxiv.org/abs/1706.03762)
  • “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” by Devlin et al.: This paper describes BERT, a powerful LLM that uses masked language modeling to pre-train a bidirectional Transformer encoder. BERT has achieved impressive results on various natural language processing tasks. (https://arxiv.org/abs/1810.04805)
  • “Improving Language Understanding by Generative Pre-Training” by Radford et al.: This paper introduces GPT, an LLM that uses a Transformer decoder to generate text based on a given prompt. It was one of the first models to demonstrate the effectiveness of large-scale unsupervised pretraining. (https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf)
  • “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” by Lewis et al.: BART is an LLM that combines elements of both encoder and decoder architectures and can be fine-tuned for a variety of natural language tasks. (https://arxiv.org/abs/1910.13461)

Methods for Improving LLM Efficiency:

  • “FlashAttention: A Scalable Framework for Efficient Attention Mechanisms” by Yang et al.: This paper proposes FlashAttention, a more efficient attention mechanism that reduces memory consumption and computational complexity in LLMs. (https://arxiv.org/abs/2205.14135)
  • “Cramming: Efficient Training of Large-Scale Models without Layerwise Pretraining” by Li et al.: This paper introduces a novel training method for LLMs that enables them to be trained on a single GPU without the need for layerwise pretraining. (https://arxiv.org/abs/2212.14034)

Methods for Controlling LLM Outputs:

  • “InstructGPT: Controllable Text Generation with Content-Planning Transformer” by Xiong et al.: InstructGPT is an LLM that allows for more precise control over the generated text by incorporating a content-planning module into the Transformer decoder. (https://arxiv.org/abs/2203.02155)
  • “Constitutional AI: Aligning Language Models with Human Values” by Amodei et al.: This paper proposes a framework for aligning LLMs with human values and provides an example of how it can be used to prevent the generation of harmful text. (https://arxiv.org/abs/2212.08073)

Alternative (ChatGPT) LLM Architectures:

  • “BLOOM: A Distributed Open-Source Implementation of LLMs” by Nadkarni et al.: BLOOM is an open-source implementation of LLMs that enables distributed training across multiple machines. (https://arxiv.org/abs/2211.05100)
  • “Sparrow: A Large-Scale Language Model for Conversational AI” by Li et al.: Sparrow is an LLM developed by DeepMind for conversational AI and features a unique architecture that enables more efficient and accurate text generation. (https://arxiv.org/abs/2209.14375)
  • “BlenderBot 3: Recipes for Building Large-Scale Conversational Agents” by Roller et al.: BlenderBot 3 is an LLM developed by Facebook Meta for conversational AI and includes the ability to search the internet for information to incorporate into its responses. (https://arxiv.org/abs/2208.03188)

Important Ethical Concerns Regarding LLMs:

  • “On the Opportunities and Risks of Foundation Models” by Rishi Bommasani et al. This paper discusses the opportunities and risks associated with “foundation models,” a new class of machine learning models trained on large and diverse datasets. The paper highlights the technical, social, and ethical challenges of deploying foundation models in various domains. (https://arxiv.org/abs/2108.07258)
  • “GPT-3: Its Nature, Scope, Limits, and Consequences” by Luciano Floridi & Massimo Chiriatti. This paper examines the capabilities and limitations of GPT-3, a state-of-the-art language model, and argues that it is not designed to pass tests of mathematical, semantic, or ethical questions. The paper concludes that GPT-3 is not the beginning of a general form of artificial intelligence. (https://link.springer.com/article/10.1007/s11023-020-09548-1)
  • “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” by Emily M. Bender et al. This paper raises concerns about the risks associated with LLMs like GPT-3, including their environmental and financial costs, and recommends strategies for mitigating those risks. (https://dl.acm.org/doi/abs/10.1145/3442188.3445922)

Before you go ahead and start reading these papers, remember that LLMs such as ChatGPT and its alternatives have revolutionized NLP and hold immense potential for a wide range of applications. However, we must also be mindful of the ethical concerns surrounding these models, such as potential biases and risks of misuse. As the field continues to evolve, we must prioritize ethical considerations and work towards developing models that align with human values and promote the greater good. With the right approach, large language models can enable us to build a more inclusive and equitable future where AI and human collaboration can drive innovation and positive change.

Students on a Mission: Tackling Online Criminal Activity Under NSF’s REU Program

From left to right: Austin, Patrick, Mia, Misty, Garrett, and Andrew.

Baylor.AI lab is thrilled to announce the launch of a new research stage in our project that uses NLP to identify suspicious transactions in omnichannel online C2C marketplaces. Under the guidance of Dr. Pablo Rivas and Dr. Tomas Cerny, two groups of undergraduate students participating in NSF’s REU program will contribute to this exciting project.

Misty and Andrew, under the direction of Dr. Pablo Rivas, will be working on designing data collection strategies for the project. Their goal is to gather relevant information to support the research and ensure the accuracy of the findings.

Meanwhile, under Dr. Tomas Cerny’s direction, Patrick, Mia, Austin, and Garrett will focus on data visualization and large graph understanding. Their role is crucial in helping to understand and interpret the data collected so far.

The research project investigates the feasibility of automating the detection of illegal goods or services within online marketplaces. As more people turn to online marketplaces for buying and selling goods and services, it is becoming increasingly important to ensure the safety of these transactions.

The project will first analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. The findings will then be adapted to a specific context – locating stolen motor vehicle parts advertised via online marketplaces – to determine general ways to identify signals of illegal online sales.

The project brings together the expertise of computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features and policies that make platforms more vulnerable to criminal activity. Using this information, the researchers will generate and train deep learning-based language models to detect illicit online commerce. The models will then be applied to markets for motor vehicle parts to assess their effectiveness.

This research project represents a significant step forward in the fight against illegal activities within online marketplaces. The project results will provide law enforcement agencies and online marketplaces with valuable insights and evidence to help them crack down on illicit goods or services sold on their platforms.

We are incredibly excited to see what Misty, Andrew, Patrick, Mia, Austin, and Garret will accomplish through this project. We can’t wait to see their impact on online criminal activity research. Stay tuned for updates on their progress and more information about this cutting-edge project.

Sic’em, Bears!

Power of Data In Quantum Machine Learning

This week at the lab, we read the following paper, and here is our summary:

Huang, Hsin-Yuan, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, and Jarrod R. McClean. “Power of data in quantum machine learning.” Nature communications 12, no. 1 (2021): 2631.

Summary

This work focuses on the advancement of quantum technologies and their impact on machine learning. The two paths towards the quantum enhancement of machine learning include using the power of quantum computing to improve the training process of existing classical models and using quantum models to generate correlations between variables that are inefficient to represent through classical computation. The authors show that this picture is incomplete in machine learning problems where some training data are provided, as the provided data can elevate classical models to rival quantum models. The authors present a flowchart for testing potential quantum prediction advantage based on prediction error bounds for training classical and quantum ML methods based on kernel functions. This elevation of classical models through some training samples is illustrative of the power of data. The authors also show that, “training a specific classical ML model on a collection of N training examples (\mathbf{x}, y = f(\mathbf{x})) would give rise to a prediction model h(\mathbf{x}) with

(1)   \begin{equation*} \mathbb{E}_\mathbf{x}|h(\mathbf{x})-f(\mathbf{x})|\leq c \sqrt{p^2/N} \end{equation*}

for a constant c > 0. Hence, with N \approx p^2/\epsilon^2 training data, one can train a classical ML model to predict the function f(\mathbf{x}) up to an additive prediction error \epsilon.” They also show that a slight geometric difference between kernel functions defined by classical and quantum ML guarantees similar or better performance in prediction by classical ML. On the other hand, a sizeable geometric difference indicates the possibility of a large prediction advantage using the quantum ML model.

Additionally, the authors introduced ”projected quantum kernels” and demonstrated, through empirical results, that these outperformed all tested classical models in prediction error. This work provides a guidebook for generating ML problems that showcase the separation between quantum and classical models.

Intellectual Merit

This work provides a theoretical and computational framework for comparing classical and quantum ML models. The authors develop prediction error bounds for training classical and quantum ML methods based on kernel functions, which provide provable guarantees and are very flexible in the functions they can learn. The authors also develop a flowchart for testing potential quantum prediction advantage, a function-independent prescreening that allows one to evaluate the possibility of better performance. The authors provide a constructive example of a discrete log feature map, which gives a provable separation for their kernel. They rule out many existing models in the literature, providing a powerful sieve for focusing the development of new data encodings.

Broader Impact

The authors’ contributions to the field of quantum technologies and machine learning have significant broader impacts. The development of a flowchart for testing potential quantum prediction advantage provides a tool for researchers and practitioners to determine the possibility of better performance using quantum ML models. The authors’ framework can also be used to compare and construct hard classical models, such as hash functions, which have applications in cryptography and secure communication. The authors’ work has the potential to accelerate the development of new data encodings, leading to more efficient and accurate machine learning models. This has far-reaching implications for various applications, including image recognition, text translation, and even physics applications, where machine learning can revolutionize how we analyze and interpret data. The paper was organized and written by collaborating with three famous quantum institutes: Google Quantum AI, the Institute for Quantum Information and Matter at Caltech, and the Department of Computing and Mathematical Sciences at Caltech.

Supercomputing Leverages Quantum Machine Learning and Grover’s Algorithm

Khanal, B., Orduz, J., Rivas, P. et al. Supercomputing leverages quantum machine learning and Grover’s algorithm. J Supercomput 79, 6918–6940 (2023). https://doi.org/10.1007/s11227-022-04923-4. [ bib |  .pdf ]

Quantum computing, a field that has drawn significant attention recently, promises to revolutionize how we approach computational problems. In Bikram Khanal’s paper titled “Supercomputing leverages quantum machine learning and Grover’s algorithm,” we delve deep into the intricacies of quantum computing, with a particular focus on Grover’s algorithm and its potential applications in quantum machine learning. This journal article is an extended version of an earlier conference paper found here.

Understanding Quantum Computing and Grover’s Algorithm

At the heart of our paper is a comprehensive discussion of the basics of quantum computing. We shed light on Grover’s quantum algorithm, which promises faster search capabilities compared to classical algorithms. Our team conducted an experiment simulating classical logical circulation by exploiting the power of amplitude amplification, a core principle behind Grover’s algorithm.

The Quantum Advantage

One of the primary takeaways from our research is the potential advantage quantum computing and quantum machine learning can offer over classical counterparts. By harnessing the efficiency of quantum computers, we can significantly reduce the reliance on supercomputing power for executing complex programs. This is particularly relevant for problems that pose challenges for classical computing methods.

Exploring Quantum Machine Learning

Our paper also delves into two promising approaches in quantum machine learning:

  1. Variational Quantum Circuits: These are quantum circuits that can be tuned variably, allowing for optimization in quantum computations.
  2. Kernel-based Quantum Machine Learning: This approach leverages the concept of kernels, which are used in classical machine learning for various tasks, and adapts them for quantum computations.

We believe the intersection of quantum algorithms and machine learning is ripe for exploration. While there’s a lot of potential, it’s also a domain that requires rigorous research to unearth solutions that can genuinely outperform classical machine learning methods. Kernel-based approaches, in particular, hold significant promise in the near future of quantum machine learning and supercomputing.

Looking Ahead

Our journey into the world of quantum computing doesn’t end here. We have charted out an exciting roadmap for our future research:

  • Real-world Data Testing: We aim to test our algorithm using actual data from classical circuits. This will involve parsing the circuit and feeding it into our algorithm.
  • Optimizing Computational Complexity: Our goal is to enhance the efficiency of our approach, especially when compared to classical methods.
  • Leveraging Grover’s Algorithm for Machine Learning: We believe many classification problems in machine learning can be reframed as search problems, making them ideal candidates for Grover’s algorithm.
  • Kernel Methods and Grover’s Algorithm: In our future work, we plan to reformulate machine learning problems using kernel methods, aiming to solve them efficiently as search problems using Grover’s algorithm.

Finally, we are at the cusp of some groundbreaking discoveries that can redefine the supercomputing landscape. We invite readers and fellow researchers to join us on this exciting journey and keep a close watch on the advancements in this domain. Read the paper here [ bib |  .pdf ].

Quantum circuit for Grover algorithm with user-defined oracle on clauses |q0⟩ AND |q3⟩ , |q1⟩ XOR |q2⟩, and |q2⟩ AND |q3⟩. Note that we need seven qubits in total. For c clauses, three in this case, we require c additional qubits and one output qubit, resulting (n + c + 1) qubits quantum circuit.

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust.

Fortunately, a couple of students in our lab, Korn Sooksatra and Bikram Khanal, noticed that the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, they transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, they convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, PhD candidates, Sooksatra and Khanal, create a framework that measures the robustness of a text classifier by using the gradients of the classifier.

This paper was accepted for presentation and publication at the LXAI workshop at NeurIPS 2022 in New Orleans, LA. Download the paper here: [ bib |  .pdf ]

(Editorial) Emerging Technologies, Evolving Threats: Next-Generation Security Challenges

The Volume 3, Issue 3, of the IEEE Transactions on Technology and Society is officially published with great contributions regarding security challenges posed by emerging technologies and their effects on society.

Our editorial piece is freely accessible and briefly introduces the research on this issue and how relevant these issues are. Our discussion briefly discusses GPT and DALEE, as means to show the great advances of AI and some ethical considerations around those. Take a look:

T. Bonaci, K. Michael, P. Rivas, L. J. Robertson and M. Zimmer, “Emerging Technologies, Evolving Threats: Next-Generation Security Challenges,” in IEEE Transactions on Technology and Society, vol. 3, no. 3, pp. 155-162, Sept. 2022, doi: 10.1109/TTS.2022.3202323.

NSF Award: Using NLP to Identify Suspicious Transactions in Omnichannel Online C2C Marketplaces

Baylor University has been awarded funding under the SaTC program for Enabling Interdisciplinary Collaboration; a grant led by Principal Investigator Dr. Pablo Rivas and an amazing group of multidisciplinary researchers formed by:

  • Dr. Gissella Bichler from California State University San Bernardino, Center for Criminal Justice Research, School of Criminology and Criminal Justice.
  • Dr. Tomas Cerny is at Baylor University in the Computer Science Department, leading software engineering research.
  • Dr. Laurie Giddens from the University of North Texas, a faculty member at the G. Brint Ryan College of Business.
  • Dr. Stacy Petter is at Wake Forest University in the School of Business. She and Dr. Giddens have extensive research and funding in human trafficking research.
  • Dr. Javier Turek, a Research Scientist in Machine Learning at Intel Labs, is our collaborator in matters related to machine learning for natural language processing.

We also have two Ph.D. students working on this project: Alejandro Rodriguez and Korn Sooksatra.

This project was motivated by the increasing pattern of people buying and selling goods and services directly from other people via online marketplaces. While many online marketplaces enable transactions among reputable buyers and sellers, some platforms are vulnerable to suspicious transactions. This project investigates whether it is possible to automate the detection of illegal goods or services within online marketplaces. First, the project team will analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. Then, the team will adapt the findings to a specific context to locate stolen motor vehicle parts advertised via online marketplaces. Together, the work will lead to general ways to identify signals of illegal online sales that can be used to help people choose trustworthy marketplaces and avoid illicit actors. This project will also provide law enforcement agencies and online marketplaces with insights to gather evidence on illicit goods or services on those marketplaces.

This research assesses the feasibility of modeling illegal activity in online consumer-to-consumer (C2C) platforms, using platform characteristics, seller profiles, and advertisements to prioritize investigations using actionable intelligence extracted from open-source information. The project is organized around three main steps. First, the research team will combine knowledge from computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features, policies, and terms of service that make platforms more vulnerable to criminal activity. Second, building on the understanding of platform vulnerabilities developed in the first step, the researchers will generate and train deep learning-based language models to detect illicit online commerce. Finally, to assess the generalizability of the identified markers, the investigators will apply the models to markets for motor vehicle parts, a licit marketplace that sometimes includes sellers offering stolen goods. This project establishes a cross-disciplinary partnership among a diverse group of researchers from different institutions and academic disciplines with collaborators from law enforcement and industry to develop practical, actionable insights.

Self-supervised modeling. After providing a corpus associated with a C2C domain of interest and ontologies, we will extract features followed by attention mechanisms for self-supervised and supervised tasks. The self-supervised models include the completion of missing information and domain-specific text encoding for learning representations. Then supervised tasks will leverage these representations to learn the relationships with targets.

Dr. Bejarano’s work is Recognized by Amazon

According to the World Federation of the Deaf, more than 70 million deaf people exist worldwide. More than 80% of them live in developing countries. Recent research by Dr. Gissella Bejarano, our very own postdoctoral research scientist, has been recognized for its impact on computer vision and speech recognition, providing opportunities to help individuals with disabilities. With support from AWS, Dr. Bejarano is finding better ways to translate Peruvian Sign Language using computer vision and natural language processing.

Read more about this in this release by AWS.

SICEM: A Sensitivity-Inspired Constrained Evaluation Method for Adversarial Attacks on Classifiers with Occluded Input Data

In the rapidly evolving field of artificial intelligence, understanding the sensitivity of models to adversarial attacks is crucial. In our recent paper, Korn Sooksatra introduces the Sensitivity-inspired constrained evaluation method (SICEM) to address this concern.

Sooksatra, K., Rivas, P. Evaluation of adversarial attacks sensitivity of classifiers with occluded input data. Neural Comput & Applic 34, 17615–17632 (2022). https://doi.org/10.1007/s00521-022-07387-y

Understanding SICEM

Our proposed method, SICEM, evaluates the vulnerability of an incomplete input against an adversarial attack in comparison to a complete one. This is achieved by leveraging the Jacobian matrix concept. The sensitivity of the target classifier’s output to each attribute of the input is calculated, providing a comprehensive understanding of how changes in the input can affect the output.

    \[ s(x,y)_i =  \left|\min \left(0, \frac{\partial Z(x)_y}{\partial x_i} \cdot \left(\sum_{y^{'} \neq y} \frac{\partial Z(x)_{y^{'}}}{\partial x_i}\right) \cdot C(y, 1, 0)_i\right)\right| \]

This sensitivity score gives us an insight into how much each attribute of the input contributes to the output’s sensitivity. The score is then used to estimate the overall sensitivity of the given input and its mask.

    \[ S(x, M)_y = \sum_{i=0}^{n-1} (s(x, y)_i \cdot M_i) \]

For a complete input, the sensitivity ratio provides a comparative measure of how sensitive the classifier’s output is for an incomplete input versus a complete one.

Results and Implications

Our focus was on an automobile image from the CIFAR-10 dataset. Interestingly, adversarial examples generated by FGSM and IGSM required the same value of \epsilon, which was significantly lower than for other images. This can be attributed to the layer-wise linearity of the classifier. Larger inputs, like the automobile image, require a smaller \epsilon to create an adversarial example. However, JSMA required a higher \epsilon due to the metric of L_0 norm.

Understanding the sensitivity of AI models is paramount in ensuring their robustness against adversarial attacks. The SICEM method provides a comprehensive tool to ensure safer and more reliable AI systems. Read the full paper here [ bib |  .pdf ].

Hybrid Quantum Variational Autoencoders for Representation Learning

One of our recent papers introduces a novel hybrid quantum machine learning approach to unsupervised representation learning by using a quantum variational circuit that is trainable with traditional gradient descent techniques. Access it here: [ bib | .pdf ]

Much of the work related to quantum machine learning has been popularized in recent years. Some of the most notable efforts involve variational approaches (Cerezo 2021, Khoshaman 2018, Yuan 2019). Researchers have shown that these models are effective in complex tasks that grant further studies and open new doors for applied quantum machine learning research. Another popular approach is to perform kernel learning using a quantum approach (Blank 2020, Schuld 2019, Rebentrost 2014). In this case the kernel-based projection of data \mathbf{x} produces a easible linear mapping to the desired target y as follows:

(1)   \begin{equation*}     y(\mathbf{x})=\operatorname{sign}\left(\sum_{j=1}^{M} \alpha_{j} k\left(\mathbf{x}_{j}, \mathbf{x}\right)+b\right) \end{equation*}

for hyper parameters b,\alpha that need to be provided or learned. This enables the creation of some types of support vector machines whose kernels are calculated such that the data \mathbf{x} is processed in the quantum realm. That is \left|\mathbf{x}_{j}\right\rangle=1 /\left|\mathbf{x}_{j}\right| \sum_{k=1}^{N}\left(\mathbf{x}_{j}\right)_{k}|k\rangle. The work of Schuld et al., expands the theory behind this idea an show that all kernel methods can be quantum machine learning methods. Recently, in 2020, Mari et al., worked on variational models that are hybrid in format. Particularly, the authors focused on transfer learning, i.e., the idea of bringing a pre-trained model (or a piece of it) to be part of another model. In the case of Mari the larger model is a computer vision model, e.g., ResNet, which is part of a variational quantum circuit that performs classification. The work we present here follows a similar idea, but we focus in the autoencoder architecture, rather than a classification model, and we focus on learning representations in comparison between a classic and a variational quantum fine-tuned model.