Giving Thanks for the Pioneering Advances in Machine Learning

As we gather around the table this Thanksgiving, it’s the perfect time to reflect on and express gratitude for the remarkable strides made in machine learning (ML) over recent years. These technical innovations have advanced the field and paved the way for countless applications that enhance our daily lives. Let’s check out some of the most influential ML architectures and algorithms for which we are thankful as a community.


1. The Transformer Architecture

Vaswani et al., 2017

We are grateful for the Transformer architecture, which revolutionized sequence modeling by introducing a novel attention mechanism, eliminating the reliance on recurrent neural networks (RNNs) for handling sequential data.

Key Components:

  • Self-Attention Mechanism: Computes representations of the input sequence by relating different positions via attention weights.
    \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{Q K^\top}{\sqrt{d_k}} \right) V
  • Multi-Head Attention: Allows the model to focus on different positions by projecting queries, keys, and values multiple times with different linear projections. \text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, ..., \text{head}_h) W^O where each head is computed as: \text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V)
  • Positional Encoding: Adds information about the position of tokens in the sequence since the model lacks recurrence. \text{PE}_{(pos, 2i)} = \sin\left( \frac{pos}{10000^{2i/d_{\text{model}}}} \right) \text{PE}_{(pos, 2i+1)} = \cos\left( \frac{pos}{10000^{2i/d_{\text{model}}}} \right)

Significance: Enabled parallelization in sequence processing, leading to significant speed-ups and improved performance in tasks like machine translation and language modeling.


2. Bidirectional Encoder Representations from Transformers (BERT)

Devlin et al., 2018

We are thankful for BERT, which introduced a method for pre-training deep bidirectional representations by jointly conditioning on both left and right contexts in all layers.

Key Concepts:

  • Masked Language Modeling (MLM): Randomly masks tokens in the input and predicts them using the surrounding context. Loss Function: \mathcal{L}_{\text{MLM}} = -\sum_{t \in \mathcal{M}} \log P_{\theta}(x_t | x_{\backslash \mathcal{M}}) where \mathcal{M} is the set of masked positions.
  • Next Sentence Prediction (NSP): Predicts whether a given pair of sentences follows sequentially in the original text.

Significance: Achieved state-of-the-art results on a wide range of NLP tasks via fine-tuning, demonstrating the power of large-scale pre-training.


3. Generative Pre-trained Transformers (GPT) Series

Radford et al., 2018-2020

We express gratitude for the GPT series, which leverages unsupervised pre-training on large corpora to generate human-like text.

Key Features:

  • Unidirectional Language Modeling: Predicts the next token x_t given previous tokens x_{<t}. Objective Function: \mathcal{L}_{\text{LM}} = -\sum_{t=1}^N \log P_{\theta}(x_t | x_{<t})
  • Decoder-Only Transformer Architecture: Utilizes masked self-attention to prevent the model from attending to future tokens.

Significance: Demonstrated the capability of large language models to perform few-shot learning, adapting to new tasks with minimal task-specific data.


4. Variational Autoencoders (VAEs)

Kingma and Welling, 2013

We appreciate VAEs for introducing a probabilistic approach to autoencoders, enabling generative modeling of complex data distributions.

Key Components:

  • Encoder Network: Learns an approximate posterior q_{\phi}(z|x).
  • Decoder Network: Reconstructs the input from latent variables z, modeling p_{\theta}(x|z).

Objective Function (Evidence Lower Bound – ELBO): \mathcal{L}(\theta, \phi; x) = -\text{KL}(q_{\phi}(z|x) \| p_{\theta}(z)) + \mathbb{E}_{q_{\phi}(z|x)}[\log p_{\theta}(x|z)] where p_{\theta}(z) is typically a standard normal prior \mathcal{N}(0, I).

Significance: Provided a framework for unsupervised learning of latent representations and generative modeling.


5. Generative Adversarial Networks (GANs)

Goodfellow et al., 2014

We are thankful for GANs, which consist of two neural networks—a generator G and a critic D—competing in a minimax game.

Objective Function: \min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))] where p_{\text{data}} is the data distribution and p_z is the prior over the latent space.

Significance: Enabled the generation of highly realistic synthetic data, impacting image synthesis, data augmentation, and more.


6. Deep Reinforcement Learning

Mnih et al., 2015; Silver et al., 2016

We give thanks for the combination of deep learning with reinforcement learning, leading to agents capable of performing complex tasks.

Key Algorithms:

  • Deep Q-Networks (DQN): Approximate the action-value function Q(s, a; \theta) using neural networks. Bellman Equation: Q(s, a) = r + \gamma \max_{a'} Q(s', a'; \theta^{-}) where \theta^{-} are the parameters of a target network.
  • Policy Gradient Methods: Optimize the policy \pi_{\theta}(a|s) directly. REINFORCE Algorithm Objective: \nabla_{\theta} J(\theta) = \mathbb{E}_{\pi_{\theta}} \left[ \nabla_{\theta} \log \pi_{\theta}(a|s) R \right] where R is the cumulative reward.

Significance: Achieved human-level performance in games like Atari and Go, advancing AI in decision-making tasks.


7. Normalization Techniques

We are grateful for normalization techniques that have improved training stability and performance of deep networks.

  • Batch Normalization (Ioffe and Szegedy, 2015) Formula: \hat{x}_i = \frac{x_i - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^2 + \epsilon}} where \mu_{\mathcal{B}} and \sigma_{\mathcal{B}}^2 are the batch mean and variance.
  • Layer Normalization (Ba et al., 2016) Formula: \hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} where \mu and \sigma^2 are computed over the features of a single sample.

Significance: Mitigated internal covariate shift, enabling faster and more reliable training.


8. Attention Mechanisms in Neural Networks

Bahdanau et al., 2014; Luong et al., 2015

We appreciate attention mechanisms for allowing models to focus on specific parts of the input when generating each output element.

Key Concepts:

  • Alignment Scores: Compute the relevance between encoder hidden states h_{\text{enc}} and decoder state s_{\text{dec}}. Common Score Functions:
    • Dot-product: \text{score}(h, s) = h^\top s
    • Additive (Bahdanau attention): \text{score}(h, s) = v_a^\top \tanh(W_a [h; s])
  • Context Vector: c_t = \sum_{i=1}^T \alpha_{t,i} h_i where the attention weights \alpha_{t,i} are computed as: \alpha_{t,i} = \frac{\exp(\text{score}(h_i, s_{t-1}))}{\sum_{k=1}^T \exp(\text{score}(h_k, s_{t-1}))}

Significance: Enhanced performance in sequence-to-sequence tasks by allowing models to utilize information from all input positions.


9. Graph Neural Networks (GNNs)

Scarselli et al., 2009; Kipf and Welling, 2016

We are thankful for GNNs, which extend neural networks to graph-structured data, enabling the modeling of relational information.

Message Passing Framework:

  • Node Representation Update: h_v^{(k)} = \sigma \left( \sum_{u \in \mathcal{N}(v)} W h_u^{(k-1)} + W_0 h_v^{(k-1)} \right) where:
    • h_v^{(k)} is the representation of node v at layer k.
    • \mathcal{N}(v) is the set of neighbors of node v.
    • W and W_0 are learnable weight matrices.
    • \sigma is a nonlinear activation function.
  • Graph Convolutional Networks (GCNs): H^{(k+1)} = \sigma \left( \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(k)} W^{(k)} \right) where:
    • \tilde{A} = A + I is the adjacency matrix with added self-loops.
    • \tilde{D} is the degree matrix of \tilde{A}.

Significance: Enabled advancements in social network analysis, molecular chemistry, and recommendation systems.


10. Self-Supervised Learning and Contrastive Learning

He et al., 2020; Chen et al., 2020

We are grateful for self-supervised learning techniques that leverage unlabeled data by creating surrogate tasks.

Contrastive Learning Objective:

  • InfoNCE Loss: \mathcal{L}_{i,j} = -\log \frac{\exp(\text{sim}(z_i, z_j)/\tau)}{\sum_{k=1}^{2N} \textbf{1}_{[k \neq i]} \exp(\text{sim}(z_i, z_k)/\tau)} where:
    • z_i and z_j are representations of two augmented views of the same sample.
    • \text{sim}(u, v) = \frac{u^\top v}{\|u\| \|v\|} is the cosine similarity.
    • \tau is a temperature parameter.
    • \textbf{1}_{[k \neq i]} is an indicator function equal to 1 when k \neq i.

Significance: Improved representation learning, leading to state-of-the-art results in computer vision tasks without requiring labeled data.


11. Differential Privacy in Machine Learning

Abadi et al., 2016

We give thanks for techniques that allow training models while preserving the privacy of individual data points.

Differential Privacy Guarantee:

  • Definition: A randomized algorithm \mathcal{A} provides (\epsilon, \delta)-differential privacy if for all datasets D and D' differing on one element, and all measurable subsets S: P[\mathcal{A}(D) \in S] \leq e^\epsilon P[\mathcal{A}(D') \in S] + \delta
  • Noise Addition: Applies calibrated noise to gradients during training to ensure privacy.

Significance: Enabled the deployment of machine learning models in privacy-sensitive applications.


12. Federated Learning

McMahan et al., 2017

We are thankful for federated learning, which allows training models across multiple decentralized devices while keeping data localized.

Federated Averaging Algorithm:

  1. Local Update: Each client k updates model parameters \theta using local data D_k: \theta_k^{t+1} = \theta^t - \eta \nabla_{\theta} \mathcal{L}(\theta^t; D_k)
  2. Global Aggregation: The server aggregates updates: \theta^{t+1} = \sum_{k=1}^K \frac{n_k}{n} \theta_k^{t+1} where:
    • n_k is the number of samples at client k.
    • n = \sum_{k=1}^K n_k is the total number of samples across all clients.

Significance: Addressed privacy concerns and bandwidth limitations in distributed systems.


13. Neural Architecture Search (NAS)

Zoph and Le, 2016

We appreciate NAS for automating the design of neural network architectures using optimization algorithms.

Approaches:

  • Reinforcement Learning-Based NAS: Uses an RNN controller to generate architectures, trained to maximize expected validation accuracy.
  • Differentiable NAS (DARTS): Models the architecture search space as continuous, enabling gradient-based optimization. Objective Function: \min_{\alpha} \mathcal{L}_{\text{val}}(w^*(\alpha), \alpha) where w^*(\alpha) is obtained by: w^*(\alpha) = \arg\min_w \mathcal{L}_{\text{train}}(w, \alpha)

Significance: Reduced human effort in designing architectures, leading to efficient and high-performing models.


14. Optimizer Advancements (Adam, AdaBound, RAdam)

We are thankful for advancements in optimization algorithms that improved training efficiency.

  • Adam Optimizer(Kingma and Ba, 2014)
    Update Rules: m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t, v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2, \hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \hat{v}_t = \frac{v_t}{1 - \beta_2^t}, \theta_{t+1} = \theta_t - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}
    where:
    • g_t is the gradient at time step t.
    • \beta_1 and \beta_2 are hyperparameters controlling the exponential decay rates.
    • \eta is the learning rate.
    • \epsilon is a small constant to prevent division by zero.

Significance: Improved optimization efficiency and convergence in training deep neural networks.


15. Diffusion Models for Generative Modeling

Ho et al., 2020; Song et al., 2020

We give thanks for diffusion models, which are generative models that learn data distributions by reversing a diffusion (noising) process.

Key Concepts:

  • Forward Diffusion Process: Gradually adds Gaussian noise to data over T timesteps.
    Noising Schedule: q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I)
  • Reverse Process: Learns to denoise from x_T back to x_0.
    Objective Function: \mathcal{L}_{\text{simple}} = \mathbb{E}_{t, x_0, \epsilon} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right] where:
    • \epsilon is the noise added to the data.
    • \epsilon_\theta(x_t, t) is the model’s prediction of the noise at timestep t.

Significance: Achieved state-of-the-art results in image generation, rivaling GANs without their training instability.


Give Thanks…

This Thanksgiving, let’s celebrate and express our gratitude for these groundbreaking contributions to machine learning. These technical advancements have not only pushed the boundaries of what’s possible but have also laid the foundation for future innovations that will continue to shape our world.

May we continue to build upon these foundations and contribute to the growing field of machine learning.

References

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems. arXiv:1706.03762
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805
  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language Understanding by Generative Pre-training. OpenAI Blog.
  • Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv:1312.6114
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level Control through Deep Reinforcement Learning. Nature.
  • Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. International Conference on Machine Learning (ICML).
  • Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. arXiv:1607.06450
  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473
  • Kipf, T. N., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907
  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. arXiv:1911.05722
  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep Learning with Differential Privacy. ACM SIGSAC Conference on Computer and Communications Security.
  • McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv:1602.05629
  • Zoph, B., & Le, Q. V. (2016). Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578
  • Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv:1412.6980
  • Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv:2006.11239

DiPol-GAN: A Trailblazer in Molecular Graph Generation

Five years ago, a groundbreaking paper introduced DiPol-GAN, a generative adversarial network (GAN) designed to create molecular graphs with specific chemical properties. Authored by Michael Guarino, et al. in 2019, this work remains a testament to the ingenuity at the intersection of machine learning and computational chemistry. Let’s dive into its contributions and why it continues to be influential.


From Molecules to Meaning: DiPol-GAN’s Key Innovations

DiPol-GAN broke new ground by addressing the complexities of molecular graph generation—a task requiring precision and adherence to chemical constraints. Here are the standout elements of this innovative approach:

  1. Direct Graph Representations
    While many models relied on SMILES strings to describe molecular structures, DiPol-GAN embraced the inherent richness of graph representations. Nodes represented atoms, edges captured bonds, and the graph structure preserved the relational nature of molecules—key for meaningful chemical property optimization.
  2. Hierarchical Reasoning with DIFFPOOL
    The paper introduced the use of Differentiable Pooling (DIFFPOOL) in the GAN discriminator. DIFFPOOL hierarchically aggregated graph nodes, allowing the model to extract high-level features and improve classification performance. Moreover, the authors adapted DIFFPOOL to handle multi-relational graphs, capturing the nuances of bond types—an essential feature for molecular modeling.
  3. Reinforcement Learning for Targeted Molecules
    DiPol-GAN incorporated a policy network to nudge the generative process toward molecules with desired properties, like solubility (logP) or drug-likeness. This clever integration of reinforcement learning allowed the model to focus on chemically relevant outcomes, setting a precedent for property-driven molecular design.

Five Years Later: Why DiPol-GAN Still Resonates

Even as research in graph neural networks and molecular AI progresses, DiPol-GAN’s contributions remain strikingly relevant:

  • Raising the Bar for Molecular GANs: By addressing the dual challenges of graph isomorphism and multi-relational edges, this model set a high standard for graph-based GANs.
  • Chemical Property Alignment: The integration of reinforcement learning into graph generation directly inspired modern approaches to property-targeted molecule design.
  • Benchmark Metrics: The study’s rigorous evaluation on validity, uniqueness, and novelty using the QM9 dataset provided benchmarks that still guide research today.

Continuing the Journey

DiPol-GAN’s legacy reminds us of the powerful synergy between machine learning and chemistry. Whether you’re exploring novel graph neural network architectures or advancing property-driven molecule generation, this paper offers invaluable insights.

For those working in related domains, revisiting this milestone study could spark new ideas and breakthroughs. Be sure to include it in your references to honor its influence and acknowledge the innovation it brought to the field:

Guarino, M., & Shah, A. Rivas, P., (2019). DiPol-GAN: Generating Molecular Graphs Adversarially with Relational Differentiable Pooling. Presented at the LXAI Workshop @ Neural Information Processing Society Conference (NeurIPS), pp. 9. [download]

Let’s keep building on this foundation, advancing science one molecule at a time.

Uncovering Patterns in Car Parts – A Step Towards Combating a Cybercrime

The black market for stolen car parts is a significant problem, exacerbated by the rise of online marketplaces like Craigslist or OfferUp, where stolen goods are often sold under the radar. In response to this growing issue, our research team at Baylor University has been leveraging cutting-edge AI techniques to detect patterns in car part sales that could signal illicit activity. This work is part of the NSF-funded Research Experiences for Undergraduates (REU) program, which provides undergraduate students with hands-on research experience in critical areas like artificial intelligence. Our project, supported by NSF Grant #2210091, investigates the potential of deep learning models to analyze vast amounts of data from online listings, offering a new tool in the fight against stolen car parts.

Why This Research Matters

The theft and resale of car parts not only affect vehicle owners but also contribute to organized crime. Detecting patterns in how stolen parts are sold online can help law enforcement track and dismantle these criminal networks. This project also presents a unique challenge to the AI research community: the complexity of analyzing unstructured, noisy data from real-world platforms. By utilizing the Vision Transformer (ViT) for image analysis, our research offers a different approach compared to previous works that employed multimodal models like ImageBind and OpenFlamingo.

Dataset and Embedding Extraction

Our dataset comprises thousands of car parts advertisements scraped from Craigslist and OfferUp, each including images and textual descriptions. To process the image data, we used the Vision Transformer (ViT), a model pre-trained on ImageNet-21k. ViT processes images by splitting them into 16×16-pixel patches, allowing for the extraction of key features from each image. These features were converted into embeddings—high-dimensional vectors that represent each image’s content in a form that the model can analyze.

We extracted embeddings for nearly 85,000 images, which were then compiled into a CSV file for further analysis, including clustering and visualization. Unlike prior works by Hamara & Rivas (2024) and Rashid & Rivas (2024), which utilized multimodal models like ImageBind and OpenFlamingo to fuse image and text data, we focused solely on image embeddings in this phase to assess the effectiveness of ViT in capturing visual patterns related to illicit activities.

Clustering and Evaluation

With the embeddings extracted, we used UMAP (Uniform Manifold Approximation and Projection) to project the high-dimensional data into a more interpretable 2D space for visualization. We then applied K-Means clustering, a widely used algorithm for grouping data, and experimented with different embedding dimensions—16, 32, 64, and 128—to identify the optimal number of clusters.

Among these, 64 dimensions proved to be the best suited for our dataset, as determined by three key clustering performance metrics:

  • Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters. A value of 0.015 indicated that some clusters were poorly defined.
  • Calinski-Harabasz Index: Evaluates the variance ratio between clusters versus within clusters.
  • Davies-Bouldin Index: Measures the average similarity between each cluster and its most similar cluster.

Although 128 dimensions performed well in some tests, 64 dimensions provided the clearest balance between cluster purity and computational efficiency. The low silhouette score, while indicating some overlap between clusters, helped confirm that most clusters were well-defined, despite several outliers—posts that displayed mixed or unclear features, such as images showing both powertrains and vehicle exteriors.

Findings and Analysis

Using the K-Means algorithm, we identified 20 distinct clusters, each representing different categories of car parts. Here are some key findings:

  • Cluster 0: Primarily contained exterior shots of full vehicles.
  • Cluster 1: Featured exterior components like mirrors and bumpers.
  • Cluster 2: Focused on powertrain parts such as engines and transmissions.
  • Cluster 3: Showcased body panels including doors, trunks, and hoods.
  • Cluster 4: Grouped images of towing accessories like trailer hitches.

After clustering, we applied K-Nearest Neighbors (KNN) to identify the top 10 posts nearest to each cluster centroid, which allowed us to analyze representative posts and confirm the coherence of each cluster. Despite the general success of this approach, outliers emerged in the UMAP visualization, indicating the need for further refinement to handle posts with mixed features. This challenge is common in image analysis, particularly when models rely solely on visual data without the contextual information that multimodal models can provide.

UMAP Visualization for 64 dimensions

Comparative Analysis with Prior Work

Our approach contrasts with that of Hamara & Rivas (2024) and Rashid & Rivas (2024), who utilized multimodal models like ImageBind and OpenFlamingo to integrate image and text data for enhanced analysis. While their methods leveraged the fusion of multiple data types to capture richer context, we aimed to assess the capabilities of ViT in isolating visual patterns indicative of illicit activity. This comparison highlights the trade-offs between focusing on single-modality models versus multimodal approaches in detecting complex patterns within unstructured data.

Broader Impact

This research demonstrates the potential of AI in analyzing large, unstructured datasets from online marketplaces, providing law enforcement with new tools to monitor and track stolen car parts. From a technical perspective, our project highlights the effectiveness of using ViT for image analysis in this context. As we continue refining our models and consider integrating multimodal approaches inspired by prior work, our collaboration with crosdisciplinary partners will ensure that this system becomes a valuable tool for combating the sale of stolen goods online.

As stated previously, the silhouette score for the dataset proved to be notably small, which was supported by the visualization containing numerous outliers. This may be attributed to clusters lacking clear definition, meaning that several posts contained images without many distinguishable features. This is understandable considering that while clusters emphasized a focus on specific car parts, many images still displayed various other vehicle components. For instance, although Cluster 2 primarily featured images of powertrains, the posts in this cluster also included shots of the exterior and body panels of the vehicle. This is logical as sellers often aim to showcase multiple facets of the vehicle when listing it, explaining the lack of focus on specific car parts.

About the Author

Cameron Armijo is a Computer Science undergraduate student at Baylor University, specializing in data mining.

NSF Award: Using NLP to Identify Suspicious Transactions in Omnichannel Online C2C Marketplaces

Baylor University has been awarded funding under the SaTC program for Enabling Interdisciplinary Collaboration; a grant led by Principal Investigator Dr. Pablo Rivas and an amazing group of multidisciplinary researchers formed by:

  • Dr. Gissella Bichler from California State University San Bernardino, Center for Criminal Justice Research, School of Criminology and Criminal Justice.
  • Dr. Tomas Cerny is at Baylor University in the Computer Science Department, leading software engineering research.
  • Dr. Laurie Giddens from the University of North Texas, a faculty member at the G. Brint Ryan College of Business.
  • Dr. Stacy Petter is at Wake Forest University in the School of Business. She and Dr. Giddens have extensive research and funding in human trafficking research.
  • Dr. Javier Turek, a Research Scientist in Machine Learning at Intel Labs, is our collaborator in matters related to machine learning for natural language processing.

We also have two Ph.D. students working on this project: Alejandro Rodriguez and Korn Sooksatra.

This project was motivated by the increasing pattern of people buying and selling goods and services directly from other people via online marketplaces. While many online marketplaces enable transactions among reputable buyers and sellers, some platforms are vulnerable to suspicious transactions. This project investigates whether it is possible to automate the detection of illegal goods or services within online marketplaces. First, the project team will analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. Then, the team will adapt the findings to a specific context to locate stolen motor vehicle parts advertised via online marketplaces. Together, the work will lead to general ways to identify signals of illegal online sales that can be used to help people choose trustworthy marketplaces and avoid illicit actors. This project will also provide law enforcement agencies and online marketplaces with insights to gather evidence on illicit goods or services on those marketplaces.

This research assesses the feasibility of modeling illegal activity in online consumer-to-consumer (C2C) platforms, using platform characteristics, seller profiles, and advertisements to prioritize investigations using actionable intelligence extracted from open-source information. The project is organized around three main steps. First, the research team will combine knowledge from computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features, policies, and terms of service that make platforms more vulnerable to criminal activity. Second, building on the understanding of platform vulnerabilities developed in the first step, the researchers will generate and train deep learning-based language models to detect illicit online commerce. Finally, to assess the generalizability of the identified markers, the investigators will apply the models to markets for motor vehicle parts, a licit marketplace that sometimes includes sellers offering stolen goods. This project establishes a cross-disciplinary partnership among a diverse group of researchers from different institutions and academic disciplines with collaborators from law enforcement and industry to develop practical, actionable insights.

Self-supervised modeling. After providing a corpus associated with a C2C domain of interest and ontologies, we will extract features followed by attention mechanisms for self-supervised and supervised tasks. The self-supervised models include the completion of missing information and domain-specific text encoding for learning representations. Then supervised tasks will leverage these representations to learn the relationships with targets.

Hybrid Quantum Variational Autoencoders for Representation Learning

One of our recent papers introduces a novel hybrid quantum machine learning approach to unsupervised representation learning by using a quantum variational circuit that is trainable with traditional gradient descent techniques. Access it here: [ bib | .pdf ]

Much of the work related to quantum machine learning has been popularized in recent years. Some of the most notable efforts involve variational approaches (Cerezo 2021, Khoshaman 2018, Yuan 2019). Researchers have shown that these models are effective in complex tasks that grant further studies and open new doors for applied quantum machine learning research. Another popular approach is to perform kernel learning using a quantum approach (Blank 2020, Schuld 2019, Rebentrost 2014). In this case the kernel-based projection of data \mathbf{x} produces a easible linear mapping to the desired target y as follows:

(1)   \begin{equation*}     y(\mathbf{x})=\operatorname{sign}\left(\sum_{j=1}^{M} \alpha_{j} k\left(\mathbf{x}_{j}, \mathbf{x}\right)+b\right) \end{equation*}

for hyper parameters b,\alpha that need to be provided or learned. This enables the creation of some types of support vector machines whose kernels are calculated such that the data \mathbf{x} is processed in the quantum realm. That is \left|\mathbf{x}_{j}\right\rangle=1 /\left|\mathbf{x}_{j}\right| \sum_{k=1}^{N}\left(\mathbf{x}_{j}\right)_{k}|k\rangle. The work of Schuld et al., expands the theory behind this idea an show that all kernel methods can be quantum machine learning methods. Recently, in 2020, Mari et al., worked on variational models that are hybrid in format. Particularly, the authors focused on transfer learning, i.e., the idea of bringing a pre-trained model (or a piece of it) to be part of another model. In the case of Mari the larger model is a computer vision model, e.g., ResNet, which is part of a variational quantum circuit that performs classification. The work we present here follows a similar idea, but we focus in the autoencoder architecture, rather than a classification model, and we focus on learning representations in comparison between a classic and a variational quantum fine-tuned model.