Giving Thanks for the Pioneering Advances in Machine Learning

As we gather around the table this Thanksgiving, it’s the perfect time to reflect on and express gratitude for the remarkable strides made in machine learning (ML) over recent years. These technical innovations have advanced the field and paved the way for countless applications that enhance our daily lives. Let’s check out some of the most influential ML architectures and algorithms for which we are thankful as a community.

1. The Transformer Architecture

Vaswani et al., 2017

We are grateful for the Transformer architecture, which revolutionized sequence modeling by introducing a novel attention mechanism, eliminating the reliance on recurrent neural networks (RNNs) for handling sequential data.

Key Components:

Self-Attention Mechanism: Computes representations of the input sequence by relating different positions via attention weights.
$\text{Attention}(Q, K, V) = \text{softmax}\left( \frac{Q K^\top}{\sqrt{d_k}} \right) V$
Multi-Head Attention: Allows the model to focus on different positions by projecting queries, keys, and values multiple times with different linear projections. $\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, ..., \text{head}_h) W^O$ where each head is computed as: $\text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V)$
Positional Encoding: Adds information about the position of tokens in the sequence since the model lacks recurrence. $\text{PE}_{(pos, 2i)} = \sin\left( \frac{pos}{10000^{2i/d_{\text{model}}}} \right) \text{PE}_{(pos, 2i+1)} = \cos\left( \frac{pos}{10000^{2i/d_{\text{model}}}} \right)$

Significance: Enabled parallelization in sequence processing, leading to significant speed-ups and improved performance in tasks like machine translation and language modeling.

2. Bidirectional Encoder Representations from Transformers (BERT)

Devlin et al., 2018

We are thankful for BERT, which introduced a method for pre-training deep bidirectional representations by jointly conditioning on both left and right contexts in all layers.

Key Concepts:

Masked Language Modeling (MLM): Randomly masks tokens in the input and predicts them using the surrounding context. Loss Function: $\mathcal{L}_{\text{MLM}} = -\sum_{t \in \mathcal{M}} \log P_{\theta}(x_t | x_{\backslash \mathcal{M}})$ where $\mathcal{M}$ is the set of masked positions.
Next Sentence Prediction (NSP): Predicts whether a given pair of sentences follows sequentially in the original text.

Significance: Achieved state-of-the-art results on a wide range of NLP tasks via fine-tuning, demonstrating the power of large-scale pre-training.

3. Generative Pre-trained Transformers (GPT) Series

Radford et al., 2018-2020

We express gratitude for the GPT series, which leverages unsupervised pre-training on large corpora to generate human-like text.

Key Features:

Unidirectional Language Modeling: Predicts the next token $x_t$ given previous tokens $x_{<t}$ . Objective Function: $\mathcal{L}_{\text{LM}} = -\sum_{t=1}^N \log P_{\theta}(x_t | x_{<t})$
Decoder-Only Transformer Architecture: Utilizes masked self-attention to prevent the model from attending to future tokens.

Significance: Demonstrated the capability of large language models to perform few-shot learning, adapting to new tasks with minimal task-specific data.

4. Variational Autoencoders (VAEs)

Kingma and Welling, 2013

We appreciate VAEs for introducing a probabilistic approach to autoencoders, enabling generative modeling of complex data distributions.

Key Components:

Encoder Network: Learns an approximate posterior $q_{\phi}(z|x)$ .
Decoder Network: Reconstructs the input from latent variables $z$ , modeling $p_{\theta}(x|z)$ .

Objective Function (Evidence Lower Bound – ELBO): $\mathcal{L}(\theta, \phi; x) = -\text{KL}(q_{\phi}(z|x) \| p_{\theta}(z)) + \mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x|z)]$ where $p_{\theta}(z)$ is typically a standard normal prior $\mathcal{N}(0, I)$ .

Significance: Provided a framework for unsupervised learning of latent representations and generative modeling.

5. Generative Adversarial Networks (GANs)

Goodfellow et al., 2014

We are thankful for GANs, which consist of two neural networks—a generator $G$ and a critic $D$ —competing in a minimax game.

Objective Function: $\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]$ where $p_{\text{data}}$ is the data distribution and $p_z$ is the prior over the latent space.

Significance: Enabled the generation of highly realistic synthetic data, impacting image synthesis, data augmentation, and more.

6. Deep Reinforcement Learning

Mnih et al., 2015; Silver et al., 2016

We give thanks for the combination of deep learning with reinforcement learning, leading to agents capable of performing complex tasks.

Key Algorithms:

Deep Q-Networks (DQN): Approximate the action-value function $Q(s, a; \theta)$ using neural networks. Bellman Equation: $Q(s, a) = r + \gamma \max_{a'} Q(s', a'; \theta^{-})$ where $\theta^{-}$ are the parameters of a target network.
Policy Gradient Methods: Optimize the policy $\pi_{\theta}(a|s)$ directly. REINFORCE Algorithm Objective: $\nabla_{\theta} J(\theta) = \mathbb{E}_{\pi_{\theta}} \left[ \nabla_{\theta} \log \pi_{\theta}(a|s) R \right]$ where $R$ is the cumulative reward.

Significance: Achieved human-level performance in games like Atari and Go, advancing AI in decision-making tasks.

7. Normalization Techniques

We are grateful for normalization techniques that have improved training stability and performance of deep networks.

Batch Normalization (Ioffe and Szegedy, 2015) Formula: $\hat{x}_i = \frac{x_i - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^2 + \epsilon}}$ where $\mu_{\mathcal{B}}$ and $\sigma_{\mathcal{B}}^2$ are the batch mean and variance.
Layer Normalization (Ba et al., 2016) Formula: $\hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}}$ where $\mu$ and $\sigma^2$ are computed over the features of a single sample.

Significance: Mitigated internal covariate shift, enabling faster and more reliable training.

8. Attention Mechanisms in Neural Networks

Bahdanau et al., 2014; Luong et al., 2015

We appreciate attention mechanisms for allowing models to focus on specific parts of the input when generating each output element.

Key Concepts:

Alignment Scores: Compute the relevance between encoder hidden states and decoder state . Common Score Functions:
- Dot-product: $\text{score}(h, s) = h^\top s$
- Additive (Bahdanau attention): $\text{score}(h, s) = v_a^\top \tanh(W_a [h; s])$
Context Vector: $c_t = \sum_{i=1}^T \alpha_{t,i} h_i$ where the attention weights $\alpha_{t,i}$ are computed as: $\alpha_{t,i} = \frac{\exp(\text{score}(h_i, s_{t-1}))}{\sum_{k=1}^T \exp(\text{score}(h_k, s_{t-1}))}$

Significance: Enhanced performance in sequence-to-sequence tasks by allowing models to utilize information from all input positions.

9. Graph Neural Networks (GNNs)

Scarselli et al., 2009; Kipf and Welling, 2016

We are thankful for GNNs, which extend neural networks to graph-structured data, enabling the modeling of relational information.

Message Passing Framework:

Node Representation Update: where:
- $h_v^{(k)}$ is the representation of node $v$ at layer $k$ .
- $\mathcal{N}(v)$ is the set of neighbors of node $v$ .
- $W$ and $W_0$ are learnable weight matrices.
- $\sigma$ is a nonlinear activation function.
Graph Convolutional Networks (GCNs): where:
- $\tilde{A} = A + I$ is the adjacency matrix with added self-loops.
- $\tilde{D}$ is the degree matrix of $\tilde{A}$ .

Significance: Enabled advancements in social network analysis, molecular chemistry, and recommendation systems.

10. Self-Supervised Learning and Contrastive Learning

He et al., 2020; Chen et al., 2020

We are grateful for self-supervised learning techniques that leverage unlabeled data by creating surrogate tasks.

Contrastive Learning Objective:

InfoNCE Loss: where:
- $z_i$ and $z_j$ are representations of two augmented views of the same sample.
- $\text{sim}(u, v) = \frac{u^\top v}{\|u\| \|v\|}$ is the cosine similarity.
- $\tau$ is a temperature parameter.
- $\textbf{1}_{[k \neq i]}$ is an indicator function equal to 1 when $k \neq i$ .

Significance: Improved representation learning, leading to state-of-the-art results in computer vision tasks without requiring labeled data.

11. Differential Privacy in Machine Learning

Abadi et al., 2016

We give thanks for techniques that allow training models while preserving the privacy of individual data points.

Differential Privacy Guarantee:

Definition: A randomized algorithm $\mathcal{A}$ provides $(\epsilon, \delta)$ -differential privacy if for all datasets $D$ and $D'$ differing on one element, and all measurable subsets $S$ : $P[\mathcal{A}(D) \in S] \leq e^\epsilon P[\mathcal{A}(D') \in S] + \delta$
Noise Addition: Applies calibrated noise to gradients during training to ensure privacy.

Significance: Enabled the deployment of machine learning models in privacy-sensitive applications.

12. Federated Learning

McMahan et al., 2017

We are thankful for federated learning, which allows training models across multiple decentralized devices while keeping data localized.

Federated Averaging Algorithm:

Local Update: Each client $k$ updates model parameters $\theta$ using local data $D_k$ : $\theta_k^{t+1} = \theta^t - \eta \nabla_{\theta} \mathcal{L}(\theta^t; D_k)$
Global Aggregation: The server aggregates updates: where:
- $n_k$ is the number of samples at client $k$ .
- $n = \sum_{k=1}^K n_k$ is the total number of samples across all clients.

Significance: Addressed privacy concerns and bandwidth limitations in distributed systems.

13. Neural Architecture Search (NAS)

Zoph and Le, 2016

We appreciate NAS for automating the design of neural network architectures using optimization algorithms.

Approaches:

Reinforcement Learning-Based NAS: Uses an RNN controller to generate architectures, trained to maximize expected validation accuracy.
Differentiable NAS (DARTS): Models the architecture search space as continuous, enabling gradient-based optimization. Objective Function: $\min_{\alpha} \mathcal{L}_{\text{val}}(w^*(\alpha), \alpha)$ where $w^*(\alpha)$ is obtained by: $w^*(\alpha) = \arg\min_w \mathcal{L}_{\text{train}}(w, \alpha)$

Significance: Reduced human effort in designing architectures, leading to efficient and high-performing models.

14. Optimizer Advancements (Adam, AdaBound, RAdam)

We are thankful for advancements in optimization algorithms that improved training efficiency.

Adam Optimizer(Kingma and Ba, 2014)
Update Rules: , , , ,
where:
- $g_t$ is the gradient at time step $t$ .
- $\beta_1$ and $\beta_2$ are hyperparameters controlling the exponential decay rates.
- $\eta$ is the learning rate.
- $\epsilon$ is a small constant to prevent division by zero.

Significance: Improved optimization efficiency and convergence in training deep neural networks.

15. Diffusion Models for Generative Modeling

Ho et al., 2020; Song et al., 2020

We give thanks for diffusion models, which are generative models that learn data distributions by reversing a diffusion (noising) process.

Key Concepts:

Forward Diffusion Process: Gradually adds Gaussian noise to data over $T$ timesteps.
Noising Schedule: $q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I)$
Reverse Process: Learns to denoise from back to .
Objective Function: where:
- $\epsilon$ is the noise added to the data.
- $\epsilon_\theta(x_t, t)$ is the model’s prediction of the noise at timestep $t$ .

Significance: Achieved state-of-the-art results in image generation, rivaling GANs without their training instability.

Give Thanks…

This Thanksgiving, let’s celebrate and express our gratitude for these groundbreaking contributions to machine learning. These technical advancements have not only pushed the boundaries of what’s possible but have also laid the foundation for future innovations that will continue to shape our world.

May we continue to build upon these foundations and contribute to the growing field of machine learning.

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems. arXiv:1706.03762
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pre-training. OpenAI Blog.
Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv:1312.6114
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level Control through Deep Reinforcement Learning. Nature.
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. International Conference on Machine Learning (ICML).
Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. arXiv:1607.06450
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473
Kipf, T. N., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. arXiv:1911.05722
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep Learning with Differential Privacy. ACM SIGSAC Conference on Computer and Communications Security.
McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv:1602.05629
Zoph, B., & Le, Q. V. (2016). Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578
Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv:1412.6980
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv:2006.11239

Resilient AI: Advancing Robustness Against Adversarial Threats with D-ReLU

Artificial intelligence (AI) is now embedded in everyday life, from self-driving cars to medical diagnostic tools, enabling tasks to be performed faster and, in some cases, more accurately than humans. However, this rapid advancement comes with significant challenges, particularly in the form of adversarial attacks. These attacks exploit small, often imperceptible changes in input data to deceive AI systems into making incorrect decisions. For example, a strategically placed sticker on a stop sign might cause an AI-powered car to misinterpret it as a speed limit sign, creating potentially dangerous situations; another example can be small perturbations added to your dog’s picture, which can lead to state-of-the-art AI to confuse it with a cat:

The Role of ReLU and Its Limitations

The Rectified Linear Unit (ReLU) activation function is a foundational component of many AI models. Its simplicity and efficiency have made it a go-to choice for training deep learning networks. However, ReLU’s unrestricted output can make models vulnerable to adversarial noise, leading to cascading errors in predictions. Attempts to address this vulnerability, such as Static-Max-Value ReLU (S-ReLU or capped ReLU), have introduced fixed output caps, but these solutions often underperform on more complex datasets and tasks.

Introducing D-ReLU

D-ReLU represents a significant advancement over traditional ReLU. It incorporates a dynamic output cap that adjusts based on the data flowing through the network. This adaptability serves as a robust defense mechanism against adversarial inputs while maintaining computational efficiency. In essence, D-ReLU acts as a self-adjusting safeguard, preserving model integrity even under duress.

Key Features of D-ReLU:

Adaptive Output Limits: D-ReLU employs learnable caps that evolve during training, enabling models to balance robustness and accuracy effectively.
Enhanced Resilience: D-ReLU has demonstrated superior performance against adversarial attacks, including FGSM, PGD, and Carlini-Wagner, while maintaining consistent performance on standard datasets.
Scalability: Tested on large-scale datasets like CIFAR-10, CIFAR-100, and TinyImagenet, D-ReLU has proven its ability to scale effectively without degradation in performance.
Efficient Training: Unlike adversarial training methods, which require extensive additional computations, D-ReLU achieves robustness naturally, streamlining the training process.
Real-World Viability: D-ReLU excels in real-world scenarios, including black-box attack settings where attackers lack full knowledge of the model.

The Broader Implications

In applications where reliability and safety are paramount—such as autonomous vehicles, financial systems, and medical imaging—D-ReLU offers a compelling solution to the challenges posed by adversarial inputs. By enhancing a model’s resilience without sacrificing performance, D-ReLU provides a vital upgrade for AI systems operating in high-stakes environments.

Future Directions

The potential of D-ReLU extends beyond current implementations. Areas of exploration include:

Further optimization for improved performance,
Applications in natural language processing and audio tasks,
Integration with complementary robust training methods for enhanced results.

For a detailed analysis and technical insights, download our paper here. If you are working on AI models, we encourage you to experiment with D-ReLU and share your experiences:

Sooksatra, Korn, and Pablo Rivas. 2024. “Dynamic-Max-Value ReLU Functions for Adversarially Robust Machine Learning Models” Mathematics 12, no. 22: 3551. https://doi.org/10.3390/math12223551

About the Author

Korn Sooksatra is a Ph.D. student at Baylor University, specializing in adversarial machine learning and AI robustness.

Enhancing AI Safety: Improving Adversarial Robustness in Vision Language Models

The Research Question

How can we improve the adversarial robustness of Vision Language Models (VLMs) to ensure their safe deployment in critical applications? This question drives our exploration into focused adversarial training techniques that improve the security of these models without excessive computational costs.

Adversarial Robustness and AI Safety

Adversarial attacks involve subtle manipulations of input data designed to deceive machine learning models into making incorrect predictions. In the context of VLMs, these attacks can have severe implications, especially when these models are deployed in sensitive areas such as autonomous driving, healthcare, and content moderation.

Enhancing the adversarial robustness of VLMs is crucial for AI safety. Robust models can withstand adversarial inputs, ensuring reliable performance and preventing malicious exploitation. Our research focuses on a novel approach to achieve this robustness by selectively re-training components of the multimodal architecture.

Our Approach

Traditional methods to improve model robustness often involve adversarial training, which integrates adversarial examples into the training process. However, this can be computationally intensive, particularly for complex models like VLMs that process images and text.

Our study introduces a more efficient strategy: adversarially re-training only the language model component of the VLM. This targeted approach leverages the Fast Gradient Sign Method (FGSM) to generate adversarial examples and incorporates them into the training of the text decoder. We maintain computational efficiency by keeping the image encoder fixed while significantly enhancing the model’s overall robustness.

Key Findings

Adversarial Training Efficiency: Adversarially re-training only the language model yields robustness comparable to full adversarial training, with reduced computational demands.
Selective Training Impact: Freezing the image encoder and focusing on the text decoder maintains high performance and robustness. In contrast, training only the image encoder results in a significant performance drop.
Benchmark Results: Experiments on the Flickr8k and COCO datasets demonstrate that our selective adversarial training approach effectively mitigates the impact of adversarial attacks, as evidenced by improved BLEU scores and model performance under adversarial conditions.

Implications for Ethical AI

Our findings support the development of more robust and secure AI systems, which is crucial for ethical AI deployment. By focusing on adversarial robustness, we contribute to the broader goal of AI safety, ensuring that multimodal models can be trusted in real-world applications.

For a detailed exploration of our methodology and findings, read the full paper pre-print: https://arxiv.org/abs/2407.21174

References

Rashid, M.B., & Rivas, P. (2024). AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning. 3rd Workshop on Ethical Artificial Intelligence: Methods and Applications, ACM SIGKDD’24. https://arxiv.org/abs/2407.21174

About the Author

Maisha Binte Rashid is a Ph.D. student at Baylor University, specializing in AI safety and multimodal machine learning.

Uncovering Patterns in Car Parts – A Step Towards Combating a Cybercrime

The black market for stolen car parts is a significant problem, exacerbated by the rise of online marketplaces like Craigslist or OfferUp, where stolen goods are often sold under the radar. In response to this growing issue, our research team at Baylor University has been leveraging cutting-edge AI techniques to detect patterns in car part sales that could signal illicit activity. This work is part of the NSF-funded Research Experiences for Undergraduates (REU) program, which provides undergraduate students with hands-on research experience in critical areas like artificial intelligence. Our project, supported by NSF Grant #2210091, investigates the potential of deep learning models to analyze vast amounts of data from online listings, offering a new tool in the fight against stolen car parts.

Why This Research Matters

The theft and resale of car parts not only affect vehicle owners but also contribute to organized crime. Detecting patterns in how stolen parts are sold online can help law enforcement track and dismantle these criminal networks. This project also presents a unique challenge to the AI research community: the complexity of analyzing unstructured, noisy data from real-world platforms. By utilizing the Vision Transformer (ViT) for image analysis, our research offers a different approach compared to previous works that employed multimodal models like ImageBind and OpenFlamingo.

Dataset and Embedding Extraction

Our dataset comprises thousands of car parts advertisements scraped from Craigslist and OfferUp, each including images and textual descriptions. To process the image data, we used the Vision Transformer (ViT), a model pre-trained on ImageNet-21k. ViT processes images by splitting them into 16×16-pixel patches, allowing for the extraction of key features from each image. These features were converted into embeddings—high-dimensional vectors that represent each image’s content in a form that the model can analyze.

We extracted embeddings for nearly 85,000 images, which were then compiled into a CSV file for further analysis, including clustering and visualization. Unlike prior works by Hamara & Rivas (2024) and Rashid & Rivas (2024), which utilized multimodal models like ImageBind and OpenFlamingo to fuse image and text data, we focused solely on image embeddings in this phase to assess the effectiveness of ViT in capturing visual patterns related to illicit activities.

Clustering and Evaluation

With the embeddings extracted, we used UMAP (Uniform Manifold Approximation and Projection) to project the high-dimensional data into a more interpretable 2D space for visualization. We then applied K-Means clustering, a widely used algorithm for grouping data, and experimented with different embedding dimensions—16, 32, 64, and 128—to identify the optimal number of clusters.

Among these, 64 dimensions proved to be the best suited for our dataset, as determined by three key clustering performance metrics:

Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters. A value of 0.015 indicated that some clusters were poorly defined.
Calinski-Harabasz Index: Evaluates the variance ratio between clusters versus within clusters.
Davies-Bouldin Index: Measures the average similarity between each cluster and its most similar cluster.

Although 128 dimensions performed well in some tests, 64 dimensions provided the clearest balance between cluster purity and computational efficiency. The low silhouette score, while indicating some overlap between clusters, helped confirm that most clusters were well-defined, despite several outliers—posts that displayed mixed or unclear features, such as images showing both powertrains and vehicle exteriors.

Findings and Analysis

Using the K-Means algorithm, we identified 20 distinct clusters, each representing different categories of car parts. Here are some key findings:

Cluster 0: Primarily contained exterior shots of full vehicles.
Cluster 1: Featured exterior components like mirrors and bumpers.
Cluster 2: Focused on powertrain parts such as engines and transmissions.
Cluster 3: Showcased body panels including doors, trunks, and hoods.
Cluster 4: Grouped images of towing accessories like trailer hitches.

After clustering, we applied K-Nearest Neighbors (KNN) to identify the top 10 posts nearest to each cluster centroid, which allowed us to analyze representative posts and confirm the coherence of each cluster. Despite the general success of this approach, outliers emerged in the UMAP visualization, indicating the need for further refinement to handle posts with mixed features. This challenge is common in image analysis, particularly when models rely solely on visual data without the contextual information that multimodal models can provide.

Comparative Analysis with Prior Work

Our approach contrasts with that of Hamara & Rivas (2024) and Rashid & Rivas (2024), who utilized multimodal models like ImageBind and OpenFlamingo to integrate image and text data for enhanced analysis. While their methods leveraged the fusion of multiple data types to capture richer context, we aimed to assess the capabilities of ViT in isolating visual patterns indicative of illicit activity. This comparison highlights the trade-offs between focusing on single-modality models versus multimodal approaches in detecting complex patterns within unstructured data.

Broader Impact

This research demonstrates the potential of AI in analyzing large, unstructured datasets from online marketplaces, providing law enforcement with new tools to monitor and track stolen car parts. From a technical perspective, our project highlights the effectiveness of using ViT for image analysis in this context. As we continue refining our models and consider integrating multimodal approaches inspired by prior work, our collaboration with crosdisciplinary partners will ensure that this system becomes a valuable tool for combating the sale of stolen goods online.

As stated previously, the silhouette score for the dataset proved to be notably small, which was supported by the visualization containing numerous outliers. This may be attributed to clusters lacking clear definition, meaning that several posts contained images without many distinguishable features. This is understandable considering that while clusters emphasized a focus on specific car parts, many images still displayed various other vehicle components. For instance, although Cluster 2 primarily featured images of powertrains, the posts in this cluster also included shots of the exterior and body panels of the vehicle. This is logical as sellers often aim to showcase multiple facets of the vehicle when listing it, explaining the lack of focus on specific car parts.

About the Author

Cameron Armijo is a Computer Science undergraduate student at Baylor University, specializing in data mining.

Gabor Filters as Initializers for Convolutional Neural Networks: A Study on Inductive Bias and Performance on Image Classification

Rivas, Pablo, and Mehang Rai. 2023. “Enhancing CNNs Performance on Object Recognition Tasks with Gabor Initialization” Electronics 12, no. 19: 4072. https://doi.org/10.3390/electronics12194072

Our latest journal article, authored by Baylor graduate and former Baylor.AI lab member Mehang Rai, MS, marks an advancement in Convolutional Neural Networks (CNNs). The paper, titled “Enhancing CNNs Performance on Object Recognition Tasks with Gabor Initialization,” has not only garnered attention in academic circles but also achieved the prestigious Best Poster Award at the LXAI workshop at ICML 2023, a top-tier conference in the field.

Pablo Rivas and Mehang Rai, ” Gabor Filters as Initializers for Convolutional Neural Networks: A Study on Inductive Bias and Performance on Image Classification “, in The LXAI Workshop @ International Conference on Machine Learning (ICML 2023), 7/2023.

A Journey from Concept to Recognition Our journey with this research began with early discussions and progress shared here. The idea was simple yet profound: exploring the potential of Gabor filters, known for their exceptional feature extraction capabilities, in enhancing the performance of CNNs for object recognition tasks. This exploration led to a comprehensive study comparing the performance of Gabor-initialized CNNs against traditional CNNs with random initialization across six object recognition datasets.

Key Findings and Contributions The results were fascinating to us. The Gabor-initialized CNNs consistently outperformed traditional models in accuracy, area under the curve, minimum loss, and convergence speed. These findings provide robust evidence in favor of using Gabor-based methods for initializing the receptive fields of CNN architectures, a technique that was explored before with little success because researchers had been constraining Gabor filters during training, precluding gradient descent to optimize the filters as needed for general purpose object recognition, until now.

Our research contributes significantly to the field by demonstrating:

Improved performance in object classification tasks with Gabor-initialized CNNs.
Superior performance of random configurations of Gabor filters in the receptive layer, especially with complex datasets.
Enhanced performance of CNNs in a shorter time frame when incorporating Gabor filters.

Implications and Future Directions This study reaffirms the historical success of Gabor filters in image processing and opens new avenues for their application in modern CNN architectures. The impact of this research is vast, suggesting potential enhancements in various applications of CNNs, from medical imaging to autonomous vehicles.

As we celebrate this achievement, we also look forward to further research. Future studies could explore initializing other vision architectures, such as Vision Transformers (ViTs), with Gabor filters.

It’s a proud moment for us at the lab to see our research recognized on a global platform like ICML 2023 and published in a journal. This accomplishment is a testament to our commitment to pushing the boundaries of AI and ML research. We congratulate Mehang Rai for this remarkable achievement and thank the AI community for their continued support and recognition.

NSF Award: Using NLP to Identify Suspicious Transactions in Omnichannel Online C2C Marketplaces

Baylor University has been awarded funding under the SaTC program for Enabling Interdisciplinary Collaboration; a grant led by Principal Investigator Dr. Pablo Rivas and an amazing group of multidisciplinary researchers formed by:

Dr. Gissella Bichler from California State University San Bernardino, Center for Criminal Justice Research, School of Criminology and Criminal Justice.
Dr. Tomas Cerny is at Baylor University in the Computer Science Department, leading software engineering research.
Dr. Laurie Giddens from the University of North Texas, a faculty member at the G. Brint Ryan College of Business.
Dr. Stacy Petter is at Wake Forest University in the School of Business. She and Dr. Giddens have extensive research and funding in human trafficking research.
Dr. Javier Turek, a Research Scientist in Machine Learning at Intel Labs, is our collaborator in matters related to machine learning for natural language processing.

We also have two Ph.D. students working on this project: Alejandro Rodriguez and Korn Sooksatra.

This project was motivated by the increasing pattern of people buying and selling goods and services directly from other people via online marketplaces. While many online marketplaces enable transactions among reputable buyers and sellers, some platforms are vulnerable to suspicious transactions. This project investigates whether it is possible to automate the detection of illegal goods or services within online marketplaces. First, the project team will analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. Then, the team will adapt the findings to a specific context to locate stolen motor vehicle parts advertised via online marketplaces. Together, the work will lead to general ways to identify signals of illegal online sales that can be used to help people choose trustworthy marketplaces and avoid illicit actors. This project will also provide law enforcement agencies and online marketplaces with insights to gather evidence on illicit goods or services on those marketplaces.

This research assesses the feasibility of modeling illegal activity in online consumer-to-consumer (C2C) platforms, using platform characteristics, seller profiles, and advertisements to prioritize investigations using actionable intelligence extracted from open-source information. The project is organized around three main steps. First, the research team will combine knowledge from computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features, policies, and terms of service that make platforms more vulnerable to criminal activity. Second, building on the understanding of platform vulnerabilities developed in the first step, the researchers will generate and train deep learning-based language models to detect illicit online commerce. Finally, to assess the generalizability of the identified markers, the investigators will apply the models to markets for motor vehicle parts, a licit marketplace that sometimes includes sellers offering stolen goods. This project establishes a cross-disciplinary partnership among a diverse group of researchers from different institutions and academic disciplines with collaborators from law enforcement and industry to develop practical, actionable insights.

Self-supervised modeling. After providing a corpus associated with a C2C domain of interest and ontologies, we will extract features followed by attention mechanisms for self-supervised and supervised tasks. The self-supervised models include the completion of missing information and domain-specific text encoding for learning representations. Then supervised tasks will leverage these representations to learn the relationships with targets.

Dr. Bejarano’s work is Recognized by Amazon

According to the World Federation of the Deaf, more than 70 million deaf people exist worldwide. More than 80% of them live in developing countries. Recent research by Dr. Gissella Bejarano, our very own postdoctoral research scientist, has been recognized for its impact on computer vision and speech recognition, providing opportunities to help individuals with disabilities. With support from AWS, Dr. Bejarano is finding better ways to translate Peruvian Sign Language using computer vision and natural language processing.

Read more about this in this release by AWS.

Evaluating Accuracy and Adversarial Robustness of Quanvolutional Neural Networks

A combination of a quantum circuit and a convolutional neural network (CNN) can have better results over a classic CNN in some cases. In our recent article, we show an example of such a case, using accuracy and adversarial examples as measures of performance and robustness. Check it out: [ bib | pdf ]

Enhancing Adversarial Examples on Deep QNetworks with Previous Information

This work finds strong adversarial examples for Deep Q Networks which are famous deep reinforcement learning models. We combine two subproblems of finding adversarial examples in deep reinforcement learning: finding states to perturb and determining how much to perturb. Therefore, the attack can jointly optimize this problem. Further, we trained Deep Q Networks to play Atari games: Breakout and Space Invader. Then, we used our attack to find adversarial examples on those games. As a result, we can achieve state-of-the-art results and showed that our attack is natural and stealthy. Paper: [ bib | pdf ]

On the Performance of Convolutional Neural Networks Initialized with Gabor Filters

When observing a fully trained CNN, researchers have found that the pattern on the kernel filters (convolution window) of the receptive convolutional layer closely resembles the Gabor filters. Gabor filters have existed for a long time, and researchers have been using them for texture analysis. Given the nature and purpose of the receptive layer of CNN, Gabor filters could act as a suitable replacement strategy for the randomly initialized kernels of the receptive layer in CNN, which could potentially boost the performance without any regard to the nature of the dataset. The findings in this thesis show that when low-level kernel filters are initialized with Gabor filters, there is a boost in accuracy, Area Under ROC (Receiver Operating Characteristic) Curve (AUC), minimum loss, and speed in some cases based on the complexity of the dataset. [pdf, bib]