Understanding the Executive Order on AI: Implications for the Industry and Academia

The White House recently released an executive order titled “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.” This directive aims to establish a framework for the responsible development and deployment of AI technologies in the United States. Here are a few key takeaways from this order and its implications for the AI industry and academic researchers.

1. What does this EO mean for the AI industry?

  • Regulatory Framework: The order emphasizes establishing a regulatory framework that ensures the safe and responsible development of AI. Companies must adhere to specific standards and best practices when developing and deploying AI technologies.
  • Transparency and Accountability: The industry is encouraged to adopt transparent methodologies and ensure that AI systems are explainable. This will likely lead to a surge in demand for tools and solutions that offer transparency in AI operations.
  • Collaboration with Federal Agencies: The order promotes cooperation between the private sector and federal agencies. This collaboration fosters innovation while ensuring AI technologies align with national interests and security.
  • Risk Management: Companies are urged to adopt risk management practices that identify and mitigate potential threats AI systems pose. This includes addressing biases, ensuring data privacy, and safeguarding against malicious uses of AI.

At the CRAIG/CSEAI, we’re committed to assisting industry and government partners in navigating this intricate AI regulatory terrain through our research, assessments, and training. Contact us to know more.

2. What does the EO mean for academics doing AI research?

  • Research Funding: The order highlights the importance of federal funding for AI research. Academics can expect increased support and resources for projects that align with the order’s objectives, especially those focusing on safety, security, and trustworthiness.
  • Ethical Considerations: Given the emphasis on trustworthy AI, researchers will be encouraged to incorporate ethical considerations into their work. This aligns with the growing movement towards AI ethics in the academic community.
  • Collaboration Opportunities: The directive promotes collaboration between academia and federal agencies. This could lead to new research opportunities, partnerships, and access to resources that were previously unavailable.
  • Publication and Transparency: The order underscores the importance of transparency in AI research. Academics will be encouraged to publish their findings, methodologies, and datasets to promote openness and reproducibility in the field.

The “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” is a significant step towards ensuring that AI technologies are developed and used responsibly. Both the AI industry and academic researchers have a pivotal role to play in realizing the order’s objectives. This order is, in part, a follow-up on prior white house’s effort to promote responsible AI.

AI Orthopraxy: Walking the Talk of Trustworthy AI

In today’s digital age, trust has become a precious commodity. It’s the invisible currency that fuels our interactions with technology and brands. Building trust, especially in technology, is a costly and time-consuming process. However, the payoff is immense. When users trust a system or a brand, they are more likely to engage with it, advocate for it, and remain loyal even when faced with alternatives.

One of the most effective ways to build trust in technology is to ensure it aligns with societal goals and values. When a system or technology operates in a way that benefits society and adheres to its values, it is more likely to be trusted and accepted.

However, artificial intelligence (AI) has faced significant challenges. Despite its immense potential and numerous benefits, trust in AI has suffered. This is due to various factors, including concerns about privacy, transparency, potential biases, and the lack of a clear ethical framework guiding its use.

This is where the concept of AI Orthopraxy comes in. AI Orthopraxy is all about the correct practice of AI. It’s about ensuring that AI is developed and used in a way that is ethical, responsible, and aligned with societal values. It’s about walking the talk of trustworthy AI.

In this talk, I will discuss the concept of AI Orthopraxy, the recent developments in AI, the associated risks, and the tools and strategies we can use to ensure the responsible use of AI. The goal is not just to highlight the challenges but also to provide a roadmap for moving forward in a way that is beneficial for all stakeholders.

Large Language Models (LLMs) and Large Multimodal Models: The Ethical Implications

The journey of Large Language Models (LLMs) has been remarkable. From the early successes of models like GPT and BERT, we have seen a rapid evolution in the capabilities of these models. The most recent iterations, such as ChatGPT, have demonstrated an impressive ability to generate human-like text, opening up many applications in areas like customer service, content creation, and more.

Parallel to this, the field of vision models has also seen significant advancements. Introducing models like Vision Transformer (ViT) has revolutionized how we process and understand visual data, leading to breakthroughs in medical imaging, autonomous driving, and more.

However, as with any powerful technology, these models come with their own challenges. One of the most concerning is their fragility, especially when faced with adversarial attacks. These attacks, which involve subtly modifying input data to mislead the model, have exposed the vulnerabilities of these models and raised questions about their reliability.

As someone deeply involved in this space, I see both the immense potential of these models and the serious risks they pose. But I firmly believe these risks can be mitigated with careful engineering and regulation.

Careful engineering involves developing robust models resistant to adversarial attacks and biases. It involves ensuring transparency in how these models work and making them interpretable so that their decisions can be understood and scrutinized.

On the other hand, regulation involves setting up rules and standards that guide the development and use of these models. It involves ensuring that these models are used responsibly and ethically and that there are mechanisms in place to hold those who misuse them accountable.

AI Ethics Standards: The Need for a Common Framework

Standards play a crucial role in ensuring technology’s responsible and ethical use. In the context of AI, they can help make systems fair, accountable, and transparent. They provide a common framework that guides the development and use of AI, ensuring that it aligns with societal values and goals.

One of the key initiatives in this space is the P70XX series of standards developed by the IEEE. These standards address various ethical considerations in system and software engineering and provide guidelines for embedding ethics into the design process.

Similarly, the International Organization for Standardization (ISO) has been working on standards related to AI. These standards cover various aspects of AI, including its terminology, trustworthiness, and use in specific sectors like healthcare and transportation.

The National Institute of Standards and Technology (NIST) has led efforts to develop a framework for AI standards in the United States. This framework aims to support the development and use of trustworthy AI systems and to promote innovation and public confidence in these systems.

The potential of these standards goes beyond just guiding the development and use of AI. There is a growing discussion about the possibility of these standards becoming recommended legal practice. This would mean that adherence to these standards would not just be a matter of ethical responsibility but also a legal requirement.

This possibility underscores the importance of these standards and their role in ensuring the responsible and ethical use of AI. However, standards alone are not enough. They need to be complemented by best practices in AI.

AI Best Practices: From Theory to Practice

As we navigate the complex landscape of AI ethics, best practices serve as our compass. They provide practical guidance on how to implement the principles of ethical AI in real-world systems.

One such best practice is the use of model cards for AI models. Model cards are like nutrition labels for AI models. They provide essential information about a model, including its purpose, performance, and potential biases. By providing this information, model cards help users understand what a model does, how well it does, and any limitations it might have.

Similarly, data sheets for datasets provide essential information about the datasets used to train AI models. They include details about the data collection process, the characteristics of the data, and any potential biases in the data. This helps users understand the strengths and weaknesses of the dataset and the models trained on it.

A newer practice is the use of Data Statements for Natural Language Processing, proposed to mitigate system bias and enable better science in NLP technologies. Data Statements are intended to address scientific and ethical issues arising from using data from specific populations in developing technology for other populations. They are designed to help alleviate exclusion and bias in language technology, lead to better precision in claims about how NLP research can generalize, and ultimately lead to language technology that respects its users’ preferred linguistic style and does not misrepresent them to others.

However, these best practices are only effective if a trained workforce understands them and can implement them in their work. This underscores the importance of education and training in AI ethics. It’s not enough to develop ethical AI systems; we must cultivate a workforce that can uphold these ethical standards in their work. Initiatives like the CSEAI promote responsible AI and develop a workforce equipped to navigate AI’s ethical challenges.

The Role of the CSEAI in Promoting Responsible AI

The Center for Standards and Ethics in AI (CSEAI) is pivotal in promoting responsible AI. Our mission at CSEAI is to provide applicable, actionable standard practices in trustworthy AI. We believe the path to responsible AI lies in the intersection of robust technical standards and ethical solid guidelines.

One of the critical areas of our work is developing these standards. We work closely with researchers, practitioners, and policymakers to develop standards that are technically sound and ethically grounded. These standards provide a common framework that guides the development and use of AI, ensuring that it aligns with societal values and goals.

In addition to developing standards, we also focus on state-of-the-art collaborative AI research and workforce development. We believe that responsible AI requires a workforce that is not just technically competent but also ethically aware. To this end, we offer training programs and resources that help individuals understand the ethical implications of AI, upcoming regulations, and the importance of bare minimum practices like Model Cards, Datasheets for Datasets, and Data Statements.

As the field of AI continues to evolve, so does the landscape of regulation, standardization, and best practices. At CSEAI, we are committed to staying ahead of these changes. We continuously update our value propositions and training programs to reflect the latest developments in the field and to ensure our standards and practices align with emerging regulations.

As the CSEAI initiative moves forward, we aim to ensure that AI is developed and used in a way that is beneficial for all stakeholders. We believe that with the right standards and practices, we can harness the power of AI in a way that is responsible, ethical, and aligned with societal values in a manner that is profitable for our industry partners and safe, robust, and trustworthy for all users.

Conclusion: The Future of Trustworthy AI

As we look toward the future of AI, we find ourselves amidst a cacophony of voices. As my colleagues put it, on one hand, we have the “AI Safety” group, which often stokes fear by highlighting existential risks from AI, potentially distracting from immediate concerns while simultaneously pushing for rapid AI development. On the other hand, we have the “AI Ethics” group, which tends to focus on the faults and dangers of AI at every turn, creating a brand of criticism hype and advocating for extreme caution in AI use.

However, most of us in the AI community operate in the quiet middle ground. We recognize the immense benefits that AI can bring to sectors like healthcare, education, and vision, among others. At the same time, we are acutely aware of the severe risks and harms that AI can pose. But we firmly believe that, like with electricity, cars, planes, and other transformative technologies, these risks can be minimized with careful engineering and regulation.

Consider the analogy of seatbelts in cars. Initially, many people resisted their use. We felt safe enough, with our mothers instinctively extending an arm in front of us during sudden stops. But when a serious accident occurred, the importance of seatbelts became painfully clear. AI regulation can be seen in a similar light. There may be resistance initially, but with proper safeguards in place, we can ensure that when something goes wrong—and it inevitably will—we will all be better prepared to handle it. More importantly, these safeguards will be able to protect those who are most vulnerable and unable to protect themselves.

As we continue to navigate the complex landscape of AI, let’s remember to stay grounded, to focus on the tangible and immediate impacts of our work, and to always strive for the responsible and ethical use of AI. Thank you.

This is a ChatGPT-generated summary of a noisy transcript of a keynote presented at Marist College on Tuesday, June 13, 2023, at 9 am as part of the Enterprise Computing Conference in Poughkeepsie, New York.

Marketing with ChatGPT: Navigating the Ethical Terrain of GPT-Based Chatbot Technology

Rivas, P.; Zhao, L. Marketing with ChatGPT: Navigating the Ethical Terrain of GPT-Based Chatbot Technology. AI 20234, 375-384. https://doi.org/10.3390/ai4020019

In the dynamic marketing world, the advent of AI-powered chatbot platforms like ChatGPT is a game-changer. This technology, leveraging natural language processing and machine learning, transforms how we interact with AI, offering significant advantages over previous tools. The open-access paper delves into the potential of ChatGPT to revolutionize marketing, provided ethical considerations are taken into account.

The paper argues that ChatGPT can accelerate content creation, enhance market research efficiency, and improve customer service automation. However, it also brings to light the ethical implications and potential risks for marketers, consumers, and other stakeholders. The ethical use of ChatGPT in marketing hinges on transparency, bias mitigation, privacy protection, and continuous monitoring to avoid potential harm while capitalizing on its benefits.

For marketing professionals, this paper is a crucial read to understand how to responsibly integrate ChatGPT into their strategies and ensure that its deployment enhances customer value without compromising ethical standards. Read the paper for free here.

On Becoming an ACM Senior Member

I was honored to recently receive the ACM Senior Member designation from the Association for Computing Machinery (ACM). For my students who asked and anyone else interested, I would like to share with you what this honor is and why I received it.

First, let me tell you a little bit about the ACM. The ACM is the world’s largest educational and scientific computing society, with a mission to advance computing as a science and profession. The ACM Senior Member designation is a distinction awarded to members who have demonstrated significant accomplishments and impact in the computing field. To be considered for this honor, a candidate must have at least 10 years of professional experience in computing and have made significant contributions to the field through research, industry, or education. Being elevated to senior member status in ACM signifies that you are an established leader in the computing field, recognized by your peers for your expertise and contributions. It also comes with certain benefits, such as access to special resources and opportunities for professional development and networking.

Overall, being a senior member of ACM is a great honor and a recognition of your significant contributions to the computing field. 🫶

I was thrilled to learn that by recommendation of my mentor in the computer science department, Dr. Hamerly, the dean of the school of engineering and computer science, Dr. Baker, and of my peers, I am now an ACM Senior Member, and I believe that my contributions to the computing field over the past decade played a significant role in this recognition. Some of my most notable achievements include the following:

  • Technical leadership: leading industry-university collaborative projects, securing funding for students’ research, directing numerous theses and independent studies, developing graduate courses on data mining and machine learning, updating and developing courses, and participating in the education committees at Marist College and Baylor University.
  • Technical contributions: over 90 publications, research in machine learning and numerical optimization, contribution to SVM theory, recent research in efficient representation learning, adversarial learning, and ethical implications of biased and unfair models, and active involvement in developing AI ethics standards through work with IEEE Standards Association.
  • Professional contributions: participation in professional events, including serving as Sponsorship & Budget chair of ACM NYC of Women in Computing and as a Program Committee Chair for NAACL 2022 LXNLP workshop, active membership in professional organizations, and full-time industry experience designing end-to-end systems to support manufacturing and supply management.
  • Recognition: elevation to IEEE Senior Member, sought-after expertise and leadership in deep learning and ethics, involvement in developing AI ethics standards, and commitment to promoting diversity and inclusion in computing through work with ACM NYC of Women in Computing and participation in the AAAI Undergraduate Consortium.

My peers in the computing community have recognized these accomplishments and contributed to advancing the field. In addition to my technical contributions, I have been actively mentoring and teaching the next generation of computing professionals.

I am incredibly grateful to the ACM for this honor, and I hope it inspires some of my students to pursue academic excellence. I believe that we can all make a massive difference in the world through our work in computing, and I look forward to continuing to make meaningful contributions to the exciting and rapidly evolving field of machine learning and responsible AI.

Thank you for taking the time to read about my journey to becoming an ACM Senior Member. If you have any questions or would like to learn more about my work, please don’t hesitate to contact me.

Dr. Hamerly, chair of the computer science department and mentor, presented the ACM Senior Member certificate.

(Editorial) Designing AI Using a Human-Centered Approach: Explainability and Accuracy Toward Trustworthiness

In the rapidly evolving world of artificial intelligence (AI), the IEEE Transactions on Technology and Society recently published a special issue that delves into the heart of AI’s most pressing challenges and opportunities. This editorial piece has garnered widespread attention. Read the full editorial here.

J. R. Schoenherr, R. Abbas, K. Michael, P. Rivas and T. D. Anderson, “Designing AI Using a Human-Centered Approach: Explainability and Accuracy Toward Trustworthiness,” in IEEE Transactions on Technology and Society, vol. 4, no. 1, pp. 9-23, March 2023, doi: 10.1109/TTS.2023.3257627.

The Essence of the Special Issue

This special issue comprises eight thought-provoking papers that collectively address the multifaceted nature of AI. The journey begins with a reconceptualization of AI, leading to discussions on the pivotal role of explainability and accuracy in AI systems. The papers emphasize that designing AI with a human-centered approach while recognizing the importance of ethics is not a zero-sum game.

Key Highlights

  1. Reconceptualizing AI: Clarke, a Fellow of the Australian Computer Society, revisits the original conception of AI and proposes a fresh perspective, emphasizing the synergy between human and artifact capabilities.
  2. The Challenge of Explainability: Adamson, a Past President of the IEEE’s Society on the Social Implications, delves into the complexities of AI systems, highlighting the concealed nature of many AI algorithms and the need for post-hoc reasoning.
  3. Trustworthy AI: Petkovic underscores that trustworthy AI requires both accuracy and explainability. He emphasizes the importance of explainable AI (XAI) in ensuring user trust, especially in high-stakes applications.
  4. Bias in AI: A team of researchers, including Nagpal, Singh, Singh, Vatsa, and Ratha, evaluates the behavior of face recognition models, shedding light on potential biases related to age and ethnicity.
  5. AI in Healthcare: Dhar, Siuly, Borra, and Sherratt discuss the challenges and opportunities of deep learning in the healthcare domain, emphasizing the ethical considerations surrounding medical data.
  6. AI in Education: Tham and Verhulsdonck introduce the “stack” analogy for designing ubiquitous learning, emphasizing the importance of a human-centered approach in smart city contexts.
  7. Ethics in Computer Science Education: Peterson, Ferreira, and Vardi discuss the role of ethics in computer science education, emphasizing the need for emotional engagement to understand the potential impacts of technology.

A Call to Action

As guest editors deeply engaged in human-centric approaches to AI, we challenge all stakeholders in the AI design process to consider the multidimensionality of AI. It’s crucial to move beyond the trade-offs mindset and prioritize accuracy and explainability. If a decision made by an AI system cannot be explained, especially in critical sectors like finance and healthcare, should it even be proposed?

This special issue is a testament to the importance of ethics, accuracy, explainability, and trustworthiness in AI. It underscores the need for a human-centered approach to designing AI systems that benefit society. For a deeper understanding of each paper and to explore the insights shared by the authors, check out the full special issue in IEEE Transactions on Technology and Society.

How to Show that Your Model is Better: A Basic Guide to Statistical Hypothesis Testing

Do you need help determining which machine learning model is superior? This post presents a step-by-step guide using basic statistical techniques and a real case study! 🤖📈 #AIOrthoPraxy #MachineLearning #Statistics #DataScience

When employing Machine Learning to address problems, our choice of a model plays a crucial role. Evaluating models can be straightforward when performance disparities are substantial, for example, when comparing two large-language models (LLMS) on a masked language modeling (MLM) task with 71.01 and 28.56 perplexity, respectively. However, if differences among models are minute, making a solid analysis to discern if one model is genuinely superior to others can prove challenging.

This tutorial aims to present a step-by-step guide to determine if one model is superior to another. Our approach relies on basic statistical techniques and real datasets. Our study compares four models on six datasets using one metric, standard accuracy. Alternatively, other contexts may use different numbers of models, metrics, or datasets. We will work with the tables below that show the properties of the datasets and the performance of two baseline models and two of our proposed models, for which we hope to show that they are better, which would be our hypothesis to be tested.

Summary of performance measured with standard accuracy
Summary of the main properties of the datasets considered in this tutorial.

One of the primary purposes of statistics is hypothesis testing. Statistical inference involves taking a sample from a population and determining how well the sample represents the population. In hypothesis testing, we formulate a null hypothesis, H_0, and an alternative hypothesis, H_A, based on the problem (comparing models). Both hypotheses must be concise, mutually exclusive, and exhaustive. For example, we could say that our null hypothesis is that the models perform equally, and the alternative could mean that the models perform differently.

Why is the ANOVA test not a good alternative?

The ANOVA (Analysis of Variance) test is a parametric test that compares the means of multiple groups. In our case, we have four models to compare with six datasets. The null hypothesis for ANOVA is that all the means are equal, and the alternative hypothesis is that at least one of the means is different. If the p-value of the ANOVA test is less than the significance level (usually 0.05), we reject the null hypothesis and conclude that at least one of the means is different, i.e., at least one model performs differently than the others. However, ANOVA may not always be the best choice for comparing the performance of different models.

One reason for this is that ANOVA assumes that the data follows a normal distribution, which may not always be the case for real-world data. Additionally, ANOVA does not take into account the difficulty of classifying certain data points. For example, in a dataset with a single numerical feature and binary labels, all models may achieve 100% accuracy on the training data. However, if the test set contains some mislabeled points, the models may perform differently. In this scenario, ANOVA would not be appropriate because it does not account for the difficulty of classifying certain data points.

Another issue with ANOVA is that it assumes that the variances of the groups being compared are equal. This assumption may not hold for datasets with different levels of noise or variability. In such cases, alternative statistical tests like the Friedman test or the Nemenyi test may be more appropriate.

Friedman test

The Friedman test is a non-parametric test that compares multiple models. In our example, we want to compare the performance of k=4 different models, i.e., two baseline models, Gabor randomized, and Gabor repeated, on N=6 datasets. First, the test calculates the average rank of each model’s performance on each dataset, with the best-performing model receiving a rank of 1. The Friedman test then tests the null hypothesis, H_0, that all models are equally effective and their average ranks should be equal. The test statistic is calculated as follows:

(1)   \begin{equation*} \chi_{F}^{2}=\frac{12 N}{k(k+1)}\left[\sum_{j=1}^{k} R_{j}^{2}-\frac{k(k+1)^{2}}{4}\right] \end{equation*}

where R is the average ranking of each model.

The test result can be used to determine whether there is a statistically significant difference between the performance of the models by making sure that \chi_{F}^{2} is not less than the critical value for the F distribution for a particular confidence value \alpha. However, since \chi_{F}^{2} could be too conservative, we also calculate the F_F statistic as follows:

(2)   \begin{equation*} F_{F}=\frac{(N-1) \chi_{F}^{2}}{N(k-1)-\chi_{F}^{2}}. \end{equation*}

Based on the critical value, F_{F}, and \chi_{F}^{2}, we evaluate H_0; once the null hypothesis is rejected, we apply a posthoc test. For this, we use the Nemenyi test to establish whether models differ significantly in their performance.

We will start the process of getting this test done by ranking the data. First, we can load the data and verify it with respect to the table shown earlier.

import pandas as pd
import numpy as np

data = [[0.8937, 0.8839, 0.9072, 0.9102],
        [0.8023, 0.8024, 0.8229, 0.8238],
        [0.7130, 0.7132, 0.7198, 0.7206],
        [0.5084, 0.5085, 0.5232, 0.5273],
        [0.2331, 0.2326, 0.3620, 0.3952],
        [0.5174, 0.5175, 0.5307, 0.5178]]

model_names = ['Glorot N.', 'Glorot U.', 'Random G.', 'Repeated G.']

df = pd.DataFrame(data, columns=model_names)

print(df.describe())  #<- use averages to verify if matches table

Output:

       Glorot N.  Glorot U.  Random G.  Repeated G.
count   6.000000   6.000000   6.000000     6.000000
mean    0.611317   0.609683   0.644300     0.649150
std     0.240422   0.238318   0.206871     0.200173
min     0.233100   0.232600   0.362000     0.395200
25%     0.510650   0.510750   0.525075     0.520175
50%     0.615200   0.615350   0.625250     0.623950
75%     0.779975   0.780100   0.797125     0.798000
max     0.893700   0.883900   0.907200     0.910200

Next, we rank the models and get their averages like so:

data = df.rank(1, method='average', ascending=False)
print(data)
print(data.describe())

Output:

   Glorot N.  Glorot U.  Random G.  Repeated G.
0        3.0        4.0        2.0          1.0
1        4.0        3.0        2.0          1.0
2        4.0        3.0        2.0          1.0
3        4.0        3.0        2.0          1.0
4        3.0        4.0        2.0          1.0
5        4.0        3.0        1.0          2.0

       Glorot N.  Glorot U.  Random G.  Repeated G.
count   6.000000   6.000000   6.000000     6.000000
mean    3.666667   3.333333   1.833333     1.166667
std     0.516398   0.516398   0.408248     0.408248
min     3.000000   3.000000   1.000000     1.000000
25%     3.250000   3.000000   2.000000     1.000000
50%     4.000000   3.000000   2.000000     1.000000
75%     4.000000   3.750000   2.000000     1.000000
max     4.000000   4.000000   2.000000     2.000000

With this information, we can expand our initial results table to show the rankings by dataset and the average rankings across all datasets for each model.

Now that we have the rankings, we can proceed with the statistical analysis and do the following:

(3)   \begin{align*} \chi_{F}^{2}&=\frac{12 \cdot 6}{4 \cdot 5}\left[\left(3.66^2+3.33^2+1.83^2+1.16^2\right)-\frac{4 \cdot 5^2}{4}\right] \nonumber \\ &=15.364 \nonumber  \end{align*}

(4)   \begin{equation*} F_{F}=\frac{5 \cdot 15.364}{6 \cdot 3-15.364}=29.143 \nonumber \end{equation*}

The critical value at \alpha=0.01 is 5.417. Thus, because the critical value is below our statistics obtained, we reject H_0 with 99% confidence.

The critical value can be obtained from any table that has the F distribution. In the table the degrees of freedom across columns (denoted as df_1) is k-1, that is the number of models minus one; the degrees of freedom across rows (denoted as df_2) is (k-1)\times(N-1), that is, the number of models minus one, times the number of datasets minus one. In our case this is df_1=3 and df_2=15.

Nemenyi Test

The Nemenyi test is a post-hoc test that compares multiple models after a significant result from Friedman’s test. The null hypothesis for Nemenyi is that there is no difference between any two models, and the alternative hypothesis is that at least one pair of models is different.

The formula for Nemenyi is as follows:

    \[CD = q_{\alpha} \sqrt{\frac{k(k+1)}{6N}}\]

where q_{\alpha} is the critical difference of the Studentized range distribution at the chosen significance level and k is the number of groups. The q_{\alpha} value can be obtained from the following table:

Critical values for the Nemenyi test, which is conducted following the Friedman test, with two-tailed results.

Thus, for our particular case study, the critical differences are:

(5)   \begin{equation*} CD_{\alpha=0.05}=2.569 \sqrt{\frac{4 \cdot 5}{6 \cdot 6}} = 1.915 \nonumber \end{equation*}

(6)   \begin{equation*} CD_{\alpha=0.10}=2.291 \sqrt{\frac{4 \cdot 5}{6 \cdot 6}} = 1.708 \nonumber \end{equation*}

Since the difference in rank between the randomized Gabor and baseline Glorot normal is 1.83 and is less than the CD_{\alpha=0.10}=1.708, we conclude Gabor is better. Similarly, since the difference in rank between the fixed Gabor and baseline Glorot uniform is 2.17 and is less than the CD_{\alpha=0.05}=1.915, we conclude that Gabor is better. Yes, there is sufficient statistical evidence to show that our model is better with high confidence.

Things we would like to see in papers

First of all, it would be nice to have a complete table that includes the results of the statistical tests as part of the caption or as a footnote, like this:

Second of all, graphics always help! A simple and visually appealing diagram is a powerful way to represent post hoc test results when comparing multiple classifiers. The figure below, which illustrates the data analysis from the table above, displays the average ranks of methods along the top line of the diagram. To facilitate interpretation, the axis is oriented so that the best ranks appear on the right side, which enables us to perceive the methods on the right as superior.

Comparison of all models against each other with the Nemenyi test. Models not significantly different at α = 0.10 or α = 0.05 are connected.

When comparing all the algorithms against each other, the groups of algorithms that are not significantly different are connected with a bold solid line. Such an approach clearly highlights the most effective models while also providing a robust analysis of the differences between models. Additionally, the critical difference is shown above the graph, further enhancing the visualization of the analysis results. Overall, this simple yet powerful diagrammatic approach provides a clear and concise representation of the performance of multiple classifiers, enabling more informed decision-making in selecting the best-performing model.

Main Sources

The statistical tests are based on this paper:

Demšar, Janez. “Statistical comparisons of classifiers over multiple data sets.” The Journal of Machine learning research 7 (2006): 1-30.

The case study is based on the following research:

Rai, Mehang. “On the Performance of Convolutional Neural Networks Initialized with Gabor Filters.” Thesis, Baylor University, 2021.

President’s Executive Order for Advancing Racial Equity in AI Systems: What It Means for the Future of AI-Based Technology

Summary: The President of the United States, Joe Biden, has recently authorized an Executive Order intending to enhance racial equity and foster support for marginalized communities via the federal government. The Order mandates that federal agencies employing artificial intelligence (AI) systems assume novel equity responsibilities and instructs them to forestall and rectify any form of discrimination, including safeguarding the public from the perils of algorithmic discrimination.

What you should know: The recent Executive Order on Further Advancing Racial Equity and Support for Underserved Communities Through The Federal Government emphasizes the importance of advancing equity for all, including communities that have long been underserved, and addressing systemic racism in the US policies and programs. This order implies that AI systems should be designed to ensure that they do not perpetuate or exacerbate inequities and should be used to address the unfair disparities faced by underserved communities. It is also implied that the Federal Government should work with civil society, the private sector, and State and local governments to redress unfair disparities and remove barriers to Government programs and services, which could be facilitated by the development and deployment of ethical and responsible AI systems. Additionally, the order emphasizes the need for evidence-based approaches to equitable policymaking and implementation, which can be achieved through collecting and analyzing data on the impacts of AI systems on different communities. Therefore, AI practitioners should ensure that their systems are designed, developed, and deployed to promote equity, fairness, and inclusivity and are aligned with the Federal Government’s commitment to advancing racial equity and supporting underserved communities.

The Center for Standards and Ethics in Artificial Intelligence (CSEAI)

Following President’s Executive Order, we at the CSEAI recognize the critical role of artificial intelligence in promoting fairness, accountability, and transparency. As a research center committed to developing responsible AI techniques, we believe our work can help meet the challenges and opportunities of emerging regulation, standardization, and best practices in AI systems. We are inviting industry members to partner with us financially and take part in collaborative research on trustworthy AI. Our mission is to provide applicable, actionable, standard practices in trustworthy AI and train a workforce that enables fairness, accountability, and transparency. We believe our work will help mitigate AI adoption’s operational, liability, and reputation risks.

The CSEAI brings together leading universities to conduct collaborative research in responsible AI techniques. We are committed to workforce development and providing accessible standards, best practices, testing, and compliance. We are proud to be a part of the NSF IUCRC Program and are excited to be supported by the NSF, which provides a standard agreement, organizational, and legal framework.

Join us in creating a better future for all Americans by developing responsible AI practices that promote fairness, accountability, and transparency. By partnering with the CSEAI, you will have the opportunity to work with a dedicated team of researchers, participate in cutting-edge research, and help shape the future of AI. Contact us today to learn more about partnering with the CSEAI.

Contact Pablo_Rivas@Baylor.edu and find out more at www.cseai.center.

Diving Into Large Language Models: An Exploration of ChatGPT and Its Alternatives

An abstract illustration that depicts a central hub or nucleus from which lines and arrows radiate outwards to represent the different layers.

Large Language Models (LLMs) have become a hot topic in the world of machine learning, with chatbots like ChatGPT and other models gaining widespread popularity. However, keeping up with the latest research and advancements in this rapidly evolving field can be challenging. To help you catch up, we’ve compiled a list of 11 essential research papers that every LLM enthusiast should read. From the original Transformer architecture to recent innovations in efficiency and alignment, these papers will give you a comprehensive understanding of the field and help you stay ahead of the curve. So whether you’re a seasoned LLM practitioner or just getting started, read on to discover the key papers that will take your understanding of this exciting field to the next level.

Foundational Papers on LLM Architecture and Pretraining:

  • “Attention is All You Need” by Vaswani et al.: This paper introduces the Transformer architecture, which uses scaled dot-product attention to process sequences of tokens. It has since become the basis for many state-of-the-art LLMs. (https://arxiv.org/abs/1706.03762)
  • “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” by Devlin et al.: This paper describes BERT, a powerful LLM that uses masked language modeling to pre-train a bidirectional Transformer encoder. BERT has achieved impressive results on various natural language processing tasks. (https://arxiv.org/abs/1810.04805)
  • “Improving Language Understanding by Generative Pre-Training” by Radford et al.: This paper introduces GPT, an LLM that uses a Transformer decoder to generate text based on a given prompt. It was one of the first models to demonstrate the effectiveness of large-scale unsupervised pretraining. (https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf)
  • “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” by Lewis et al.: BART is an LLM that combines elements of both encoder and decoder architectures and can be fine-tuned for a variety of natural language tasks. (https://arxiv.org/abs/1910.13461)

Methods for Improving LLM Efficiency:

  • “FlashAttention: A Scalable Framework for Efficient Attention Mechanisms” by Yang et al.: This paper proposes FlashAttention, a more efficient attention mechanism that reduces memory consumption and computational complexity in LLMs. (https://arxiv.org/abs/2205.14135)
  • “Cramming: Efficient Training of Large-Scale Models without Layerwise Pretraining” by Li et al.: This paper introduces a novel training method for LLMs that enables them to be trained on a single GPU without the need for layerwise pretraining. (https://arxiv.org/abs/2212.14034)

Methods for Controlling LLM Outputs:

  • “InstructGPT: Controllable Text Generation with Content-Planning Transformer” by Xiong et al.: InstructGPT is an LLM that allows for more precise control over the generated text by incorporating a content-planning module into the Transformer decoder. (https://arxiv.org/abs/2203.02155)
  • “Constitutional AI: Aligning Language Models with Human Values” by Amodei et al.: This paper proposes a framework for aligning LLMs with human values and provides an example of how it can be used to prevent the generation of harmful text. (https://arxiv.org/abs/2212.08073)

Alternative (ChatGPT) LLM Architectures:

  • “BLOOM: A Distributed Open-Source Implementation of LLMs” by Nadkarni et al.: BLOOM is an open-source implementation of LLMs that enables distributed training across multiple machines. (https://arxiv.org/abs/2211.05100)
  • “Sparrow: A Large-Scale Language Model for Conversational AI” by Li et al.: Sparrow is an LLM developed by DeepMind for conversational AI and features a unique architecture that enables more efficient and accurate text generation. (https://arxiv.org/abs/2209.14375)
  • “BlenderBot 3: Recipes for Building Large-Scale Conversational Agents” by Roller et al.: BlenderBot 3 is an LLM developed by Facebook Meta for conversational AI and includes the ability to search the internet for information to incorporate into its responses. (https://arxiv.org/abs/2208.03188)

Important Ethical Concerns Regarding LLMs:

  • “On the Opportunities and Risks of Foundation Models” by Rishi Bommasani et al. This paper discusses the opportunities and risks associated with “foundation models,” a new class of machine learning models trained on large and diverse datasets. The paper highlights the technical, social, and ethical challenges of deploying foundation models in various domains. (https://arxiv.org/abs/2108.07258)
  • “GPT-3: Its Nature, Scope, Limits, and Consequences” by Luciano Floridi & Massimo Chiriatti. This paper examines the capabilities and limitations of GPT-3, a state-of-the-art language model, and argues that it is not designed to pass tests of mathematical, semantic, or ethical questions. The paper concludes that GPT-3 is not the beginning of a general form of artificial intelligence. (https://link.springer.com/article/10.1007/s11023-020-09548-1)
  • “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” by Emily M. Bender et al. This paper raises concerns about the risks associated with LLMs like GPT-3, including their environmental and financial costs, and recommends strategies for mitigating those risks. (https://dl.acm.org/doi/abs/10.1145/3442188.3445922)

Before you go ahead and start reading these papers, remember that LLMs such as ChatGPT and its alternatives have revolutionized NLP and hold immense potential for a wide range of applications. However, we must also be mindful of the ethical concerns surrounding these models, such as potential biases and risks of misuse. As the field continues to evolve, we must prioritize ethical considerations and work towards developing models that align with human values and promote the greater good. With the right approach, large language models can enable us to build a more inclusive and equitable future where AI and human collaboration can drive innovation and positive change.

Students on a Mission: Tackling Online Criminal Activity Under NSF’s REU Program

From left to right: Austin, Patrick, Mia, Misty, Garrett, and Andrew.

Baylor.AI lab is thrilled to announce the launch of a new research stage in our project that uses NLP to identify suspicious transactions in omnichannel online C2C marketplaces. Under the guidance of Dr. Pablo Rivas and Dr. Tomas Cerny, two groups of undergraduate students participating in NSF’s REU program will contribute to this exciting project.

Misty and Andrew, under the direction of Dr. Pablo Rivas, will be working on designing data collection strategies for the project. Their goal is to gather relevant information to support the research and ensure the accuracy of the findings.

Meanwhile, under Dr. Tomas Cerny’s direction, Patrick, Mia, Austin, and Garrett will focus on data visualization and large graph understanding. Their role is crucial in helping to understand and interpret the data collected so far.

The research project investigates the feasibility of automating the detection of illegal goods or services within online marketplaces. As more people turn to online marketplaces for buying and selling goods and services, it is becoming increasingly important to ensure the safety of these transactions.

The project will first analyze the text of online advertisements and marketplace policies to identify indicators of suspicious activity. The findings will then be adapted to a specific context – locating stolen motor vehicle parts advertised via online marketplaces – to determine general ways to identify signals of illegal online sales.

The project brings together the expertise of computer science, criminology, and information systems to analyze online marketplace technology platform policies and identify platform features and policies that make platforms more vulnerable to criminal activity. Using this information, the researchers will generate and train deep learning-based language models to detect illicit online commerce. The models will then be applied to markets for motor vehicle parts to assess their effectiveness.

This research project represents a significant step forward in the fight against illegal activities within online marketplaces. The project results will provide law enforcement agencies and online marketplaces with valuable insights and evidence to help them crack down on illicit goods or services sold on their platforms.

We are incredibly excited to see what Misty, Andrew, Patrick, Mia, Austin, and Garret will accomplish through this project. We can’t wait to see their impact on online criminal activity research. Stay tuned for updates on their progress and more information about this cutting-edge project.

Sic’em, Bears!

(Editorial) Emerging Technologies, Evolving Threats: Next-Generation Security Challenges

The Volume 3, Issue 3, of the IEEE Transactions on Technology and Society is officially published with great contributions regarding security challenges posed by emerging technologies and their effects on society.

Our editorial piece is freely accessible and briefly introduces the research on this issue and how relevant these issues are. Our discussion briefly discusses GPT and DALEE, as means to show the great advances of AI and some ethical considerations around those. Take a look:

T. Bonaci, K. Michael, P. Rivas, L. J. Robertson and M. Zimmer, “Emerging Technologies, Evolving Threats: Next-Generation Security Challenges,” in IEEE Transactions on Technology and Society, vol. 3, no. 3, pp. 155-162, Sept. 2022, doi: 10.1109/TTS.2022.3202323.