Quality Engineering Design for AI Platform Adoption

Introduction

We are living in the AI Golden Age (1). Businesses that use AI become more creative, competitive, and responsive. The software creation and consumption process have matured because of the software-as-a-service (SaaS) paradigm and cloud improvements. The majority of companies would rather “purchase” AI capabilities than “create” their own. As a result, SaaS providers like Salesforce, SAP, Oracle, and others have added AI platform capabilities, resulting in the AI-as-a-Service (AIaaS) model. This advancement has simplified AI adoption for businesses (2).

Testing, in particular, plays a critical role in AI platform adoption for quality assurance (QA). Testing is difficult in the adoption of an AI platform for the following reasons:

Intelligent test methodologies, virtualized cloud resources, specialist expertise, and AI-enabled tools are all required for AI testing.
While AI platform providers would deliver updates often, testing should follow suit.
In general, AI products lack transparency and will be difficult to explain (3). As a result, trust is harder to come by.
It is not only the AI product that is significant but also the quality of the training models and the data. As a result, traditional testing approaches for validating cloud resources, algorithms, interfaces, and user settings would be ineffective. It is also critical to assess learning, reasoning, perceptions, manipulations, and other skills.

The AI logic is provided by the software provider in a plug-and-play AI solution model. The end-user enhances the experiences by building the interfaces, providing data for training the logic, training the logic in the solution environment, and building the interfaces. To begin, we should test the data, algorithm, integration, and user experiences, just as we would in traditional testing. Second, the training model should be verified to test the solution’s functional fit, which would extend the testing to reasoning, planning, and learning, among other things. Finally, a strategy for validating the AI algorithm should be created. Finally, functional validation should include techniques that AI logic may employ, such as search, optimization, probability, and so on.

Core Necessity in the AI Platform Adoption: Continuous Testing

For AI platform adoption, QA maturity through a high degree of automation is crucial. Release cycles can be quick and highly automated as businesses upgrade their infrastructure and engineering techniques. Techniques for continuous integration (CI) have proven to be useful (4). Multiple QA feedback loops are generated as code is logged in several times a day and then recompiled. To implement CI properly, the build and deployment processes must be automated. While CI is built on automation, test automation enables continuous delivery (CD) (5). In conclusion, CI is the driving force of CD. Continuous testing (CT), continuous development, and continuous delivery have all become more institutionalized as Agile and DevOps models have evolved.

Data, apps, infrastructure, and other aspects of a corporate enterprise are continually changing. At the same time, the SaaS vendor is constantly updating the AI solution in order to increase user experiences and efficiency. In such a dynamic environment, it is critical to developing a continuous testing ecosystem in which a fully automated test environment not only confirms the ever-changing enterprise IT assets but also validates the evolving AI product versions.

The following factors must be considered when establishing a CT ecosystem:

Automated test scripts should be stored in an enterprise version control system. A version control repository should be used for both automation and application codebases. It will be easier to align test assets with application and data assets this way.
To enable centralized execution and reporting, combine the automation suite with a code/data build deployment platform. It’s critical to match code/data builds to the appropriate automation suite. To avoid human interaction, tool-based auto-deployment during every build is a must.
To offer faster feedback at each stage, divide the automata suite into numerous layers of tests. An AI health check, for example, can ensure that services are operational following the deployment of modifications to interfaces and data structures. An AI smoke test helps ensure that essential system functions are working and that there are no blocking problems.
Also, provide the training model. AI testing should also include a test of the training model, which verifies if the solution has learned both supervised and unsupervised instructions. It’s crucial to recreate the identical events multiple times to ensure that the responses match the training. Similarly, as part of testing, having a procedure to train the solution on bugs, errors, exceptions, blunders, and other issues is vital. If exception handling is carefully thought out, fault/error tolerance can be established.
Plan to provide AI training and education throughout the adoption process. With the CT arrangement, learning should be able to continue from testing to production rollout with fewer concerns about transfer learning.
Improve by using intelligent regression. If the overall regression execution cycle time is very long, CT should carve out a subset of regression for run-time execution based on the most severely impacted sections in order to deliver feedback within a suitable time frame. Effective use of machine learning methods to establish a probabilistic model for picking regression tests (6) that are aligned to the specific code and data build aids in the efficient use of cloud resources and accelerates testing.
Plan for full regression on a regular basis. Depending on the alignment with repeating construction frequencies, this can be moved to the weekend or overnight. This is the CT ecosystem’s last feedback. By using parallel execution threads or machines, the idea is to reduce feedback time.

Bugs, faults, mistakes, and any algorithmic exceptions all become sources of discoveries for the AI solution when there is no manual intervention for testing. Similarly, during testing, actual usage and user preferences provide a source of training that should be continued into production.

Assuring the Data Extraction in AIaaS Adoption

The most critical success criterion for AI adoption is data quality. There is useful data both inside and outside the company. A prerequisite is the capacity to extract meaningful data and make it available to the AI engine. ETL stands for extract, transform, and load, and refers to a data pipeline that takes data from several sources, transforms it according to business rules, and puts it into a destination data storage. Enterprise data integration (EDI), enterprise application integration (EAI), and enterprise cloud integration platforms as a service (iPaaS) have all evolved from ETL (7, 8, 9). Regardless of technical developments, the importance of data assurance has only increased. Functional testing activities such as map-reduce process validation, transformation logic validation, data validation, data storage validation, and so on should all be covered by data assurance. Non-functional factors like performance, failover and data security should also be addressed.

Structured data is easier to manage, however unstructured data from outside of the company IT department should be treated with caution. Stream processing principles aid in the preparation of data in motion, i.e., event-driven processing of data produced or received from websites, external apps, mobile devices, sensors, and other sources. It is essential to check the quality through established quality gates. Popular data sources include messaging platforms like Twitter, Instagram, and WhatsApp. They employ a cloud-based communications system to connect applications, services, and devices across diverse technologies. Deep learning technologies exist that allow computers to learn from large amounts of data. To handle sophisticated signal processing and pattern recognition problems like speech-to-text transcription, handwriting identification, and facial recognition, some of these data would require neural network solutions (10, 11, 12). To test the data that flows from these platforms, quality gates should be developed.

Following are some design considerations for AI-driven QA orchestration.

Automate quality gates: Machine learning algorithms can be used to evaluate if data is “good to go” or “bad to go” based on historical and perceived standards.

Predict root causes: Triaging or determining the root cause of a data defect not only aids in the prevention of future bugs but also aids in the continual improvement of data quality. Test teams can leverage patterns and correlations to create machine learning algorithms that track errors back to their source (13). This allows the data to execute remedial checks and fixes before moving on to the next level, which includes self-testing and self-healing.

Using precognitive monitoring: Machine learning algorithms may look for symptoms in data patterns and associated coding issues like high memory utilization, a possible hazard that could result in an outage, and teams can immediately perform corrective actions. The AI engine, for example, can automatically start a parallel operation to reduce server usage.

Failover: Machine learning algorithms can detect problems and automatically recover to continue processing while logging the failure.

Assuring the AI Algorithm in AIaaS Adoption

Developing tests is simple when the internals of a software system are known. The “interpretability” of AI and ML in an AI platform solution is low (3), which means that the input/output mapping is the only known element and the mechanism for the underlying AI function (for example, prediction) cannot be examined or comprehended. Though traditional black-box testing aids in the input/output mapping, humans will have difficulties believing the testing model if there is a lack of transparency. Of course, the AI platform solution is a black box; however, there are unique AI methodologies that can help assess AI capability beyond input and output mapping. For design considerations, some AI-driven black-box testing methodologies include:

Posterior predictive checks (PPC) to recreate replicated data under the fitted model and then compare these to the observed data. As a result, posterior predictive testing can be used to “search for systematic disparities between real and simulated data.”
To optimize test scenarios, genetic algorithms are used (14). The task of creating test cases is to find a set of data that will provide the most coverage when fed into the product under test. The test cases can be optimized if this problem is solved. Selection, crossover, and mutation are all basic acts of natural evolution that are performed by adaptive heuristic search algorithms. Feedback information about the tested application is utilized to verify whether the test data meets the testing requirements when test cases are generated using a heuristic search. The feedback system alters test data incrementally until the test requirements are met.
Automatic test case generation using neural networks. Physical and cellular systems are capable of acquiring, storing, and processing experiential knowledge. They perform learning challenges by simulating the human brain. The autonomous development of test cases employs neural network learning techniques (15). A neural network is trained on a series of test cases applied to the original AI platform product in this model. The network training focuses solely on the system’s inputs and outputs. The trained network can then be utilized as an artificial oracle to assess the accuracy of the output provided by fresh, potentially flawed versions of the AI platform product.
Model-based regression test selection using fuzzy logic. While these methods are effective in projects that already use model-driven development methodologies, one major drawback is that the models are usually built at a high level of abstraction. They don’t have the data they need to create traceability relationships between models and coverage-related execution traces from code-level test cases. There are fuzzy logic-based systems that can automatically modify abstract models to provide detailed models that can be used to identify traceability relationships (16). The procedure involves some uncertainty, which is addressed by using fuzzy logic based on refinements to classify test cases as re-testable based on the probability accuracy associated with the refinement in question.

Assuring the Integration and Interfaces in AIaaS Adoption

All SaaS solutions, including AI-as-a-service, contain a set of defined web services with which enterprise applications and other intelligent sources can interact to get the desired result. Web services have evolved to allow interoperability across platforms. Because of their increased flexibility, most web services can now be used by a wide range of systems. Because of the complexity of these interfaces, more testing will be required. It’s even more important to evaluate the compatibility of these APIs in a CI/CD system.

The main difficulty is virtualizing web services and validating data flow between the AI platform solution and application or IoT interfaces. The following are the key reasons why interface/web service testing is difficult:

Unless it is integrated with another source that is not yet ready to test, there is no user interface to test.
Regardless of whether the application utilizes them or how frequently they are utilized, all pieces of service must be validated.
The service’s underlying security parameters must be verified.
Different communication protocols are used to connect to services.
Multiple channels simultaneously calling a service cause performance and scalability difficulties.

Testing the interface layer will particularly demand:

To replicate the behavior of a component or application. AI testing for accuracy, completeness, consistency, and speed should emulate the complexity of AI application interfaces with humans, machines, and software.
To look for unusual code usage. The adoption of real-world applications and the use of open-source libraries could bring non-standard code and data into the enterprise IT ecosystem. As a result, they should be verified.

Assuring the User Experiences in AIaaS Adoption

Customer experience has become critical for corporate success in the new realities of remote work and life. This is an even more important goal in AI adoption. Non-functional testing is a well-established phenomenon that ensures a positive user experience by validating performance, security, and accessibility. In general, next-generation technologies have increased the complexity of experience assurance.

In the broader AI testing framework, here are some key design considerations for experience assurance.

Rather than testing for the experience, design for it. The end-user perspective should inform enterprise AI strategy. It is critical that the testing team accurately represents the actual customers. Customers’ early involvement in the design process will benefit not only the design but also enable early access to develop trust.

A build-test-optimize model is used to deliver agility and automation. The user experience should be taken into account during scrum testing cycles. Early experience testing will aid in the implementation of a build-test-optimize cycle.

Continuous security is crucial when using an Agile strategy. Make the corporate security team a member of the Agile team to 1) own and validate the organization’s threat model at the scrum level, and 2) assess the structural vulnerabilities (from the perspective of a hypothetical hacker) for all multi-channel interfaces that SaaS AI solution architecture may have.

Speed is essential. Volume, velocity, diversity, and variability of AI data would necessitate pre-processing, parallel/distributed processing, and/or stream processing. Testing for performance will aid in optimizing the design for distributed processing, which is essential to meet the system’s speed expectations.

Text and voice testing nuance is crucial. According to several studies, conversational AI is still at the top of the corporate agenda. Augmented reality, virtual reality, edge AI, and other new technologies continue to emerge. As a result, text, voice, and natural language processing testing should be done.

Simulation aids in the testing of limits. For experience assurance, checking for user situations is critical. When it comes to AI, testing for exceptions, errors, and violations can assist forecast system behavior, which can help us assess AI applications’ error/fault tolerance levels.

Transparency, trust, and diversity. To reduce risks and grow confidence in AI, it is necessary to verify the trust that enterprise users develop in the AI conclusion, confirm the transparency requirements of the data sources and algorithms, and ensure variety in the data sources and user/tester involvement. Not only should testers have expanded domain expertise, but they need also be familiar with the technical aspects of the data, algorithms, and integration processes inside the wider business IT.

Conclusion

Continuous testing is a necessary component of AI platform adoption. The modular method for perfecting data, algorithm, integration, and experience assurance operations should be used. This will enable us to build a continuous testing ecosystem in which enterprise IT can always be prepared to accept frequent modifications to internal and external AI components.

For more info: https://www.qaaas.co.uk/testing-services/

Also Read: https://www.guru99.com/software-testing.html