10 AIGP Practice Questions Calibrated to BoK v2.1 (2026 Exam) | Archuz

The fastest way to calibrate your AIGP readiness is not to read another framework summary — it is to sit with questions that force you to apply the frameworks under time pressure. These ten questions are designed to surface the precise gaps that send well-prepared candidates out of the exam room frustrated: Provider vs. Deployer confusion, lifecycle sequencing errors, and the "utopian answer" trap that punishes idealists who ignore documented risk acceptance thresholds.

Each question is tagged by domain and cognitive level (Recall, Apply, or Evaluate). The distractor analysis after every answer is where the real learning happens — study the wrong answers as carefully as the right one.

2
Domain I
Foundations
3
Domain II
Law & Frameworks
3
Domain III
Development
2
Domain IV
Deployment
Before Reading Any Answer Choice

Pause on every scenario question and identify three things: (1) the lifecycle stage described in the stem, (2) whether the organisation is acting as a Provider or Deployer, and (3) which framework's obligations govern the situation. This three-variable lock is the single most reliable technique for eliminating distractors on the real exam.

Domain I — Foundations of AI

Question 01 of 10 Domain I · Foundations · Recall

A financial institution deploys a machine learning model that assigns creditworthiness scores to loan applicants. After several months of operation, the risk team observes that the model's predictions have drifted — applicants from a particular demographic group who historically received lower scores are now receiving even lower scores than the original training data would predict, despite no documented change in the model's parameters.

Which term most precisely describes the phenomenon the risk team has identified?
  • A Model overfitting, caused by insufficient regularization during training.
  • B Feedback loop amplification, where prior model outputs become inputs that reinforce and compound historical bias over time.
  • C Data poisoning, caused by a malicious actor injecting manipulated records into the training pipeline.
  • D Distribution shift, caused by a change in the statistical properties of live applicant data relative to the training set.
Expert Explanation

The key phrase in the stem is "no documented change in the model's parameters." This eliminates explanations that depend on retraining or external interference. The bias is worsening over time through operation — the defining signature of a feedback loop. When a model's outputs (lower credit scores) affect the real-world opportunities available to a group, which then affects the data generated by that group, which is fed back into future model inputs, the original bias compounds. This is distinct from drift, overfitting, or poisoning.

A
Overfitting is a training-time phenomenon; it describes a model that memorises training data too closely. It does not explain worsening bias post-deployment with unchanged parameters.
B
Correct. Feedback loop amplification precisely describes the self-reinforcing mechanism. The model's outputs shape future inputs, compounding the original disparity without any external intervention.
C
Data poisoning requires intentional malicious action on the training pipeline. The stem gives no indication of deliberate interference, making this a misdirection distractor.
D
Distribution shift does describe changing statistical properties, and is a plausible distractor. However, distribution shift is caused by external changes in the population — not by the model's own outputs feeding back into the system. D describes a cause; B describes the mechanism driving the specific pattern observed.
Question 02 of 10 Domain I · Foundations · Apply

An AI governance team is cataloguing the organisation's AI systems for a new enterprise inventory. One system uses a pre-trained large language model to generate first-draft responses for customer service agents, who review and edit each response before it is sent to the customer. A second system automatically approves or denies customer refund requests under $50 based on purchase history, with no human review step.

How should the team classify these two systems in terms of human oversight?
  • A Both systems operate with human-in-the-loop oversight because a human agent is present within each workflow.
  • B System 1 is human-in-the-loop; System 2 is fully automated. Neither requires additional governance controls because the decisions involved are low-stakes.
  • C System 1 is human-in-the-loop; System 2 is human-out-of-the-loop. System 2 requires more robust governance controls due to the absence of human review at the point of decision.
  • D System 1 is human-on-the-loop because agents can override the output; System 2 is human-in-the-loop because the policy parameters were set by a human.
Expert Explanation

The distinction between human-in-the-loop and human-out-of-the-loop turns on whether a human reviews and can modify the AI output before it affects the subject. In System 1, the agent reviews every draft before it reaches the customer — that is the definition of human-in-the-loop. In System 2, the refund decision is made and applied without any human review step — that is human-out-of-the-loop, regardless of who set the policy thresholds. The claim in option B that low dollar value eliminates governance concerns conflates transaction value with governance risk, which the BoK explicitly does not permit.

A
Incorrect. A human being "present" in the workflow is not sufficient. The oversight classification depends on whether that human can intervene before the decision is applied to the affected individual. In System 2, no human touches the individual transaction.
B
The classification of System 2 is correct, but the assertion that low stakes removes governance requirements is not. Dollar value does not govern the oversight requirement — the nature and reversibility of the decision does.
C
Correct. Accurate classification of both systems, and correctly flags that human-out-of-the-loop requires more stringent governance controls, not fewer.
D
A sophisticated distractor. Human-on-the-loop means a human monitors the system and can intervene in the aggregate, but individual decisions execute without per-decision review. System 1 does not match this — agents review each item. Claiming System 2 is human-in-the-loop because a human set the rules is a governance reasoning error the exam tests explicitly.

Domain II — AI Laws & Frameworks

Question 03 of 10 Domain II · EU AI Act · Apply

A European hospital deploys an AI system purchased from a US-based vendor. The system analyses patient imaging data to flag potential anomalies for radiologist review. The vendor's technical documentation states that the system was validated for use as a "clinical decision support tool for trained medical professionals." The hospital has not conducted its own conformity assessment prior to deployment. A regulatory audit is initiated.

Under the EU AI Act, which party bears primary accountability for the missing conformity assessment, and what is the most accurate description of the hospital's role in this context?
  • A The US-based vendor bears sole accountability as the system's manufacturer; the hospital has no obligations under the EU AI Act because it did not develop the system.
  • B The hospital bears sole accountability because it is operating the system within the EU, regardless of where it was developed.
  • C The vendor, as Provider, holds primary accountability for conformity. The hospital, as Deployer, holds independent obligations including verifying that required documentation was received and that the system is used within its intended purpose — obligations the hospital did not meet.
  • D Both parties share equal accountability because the EU AI Act does not distinguish between manufacturers and operators for high-risk medical systems.
Expert Explanation

This question tests the BoK v2.1 Provider/Deployer distinction, one of the most heavily examined concepts in the February 2026 curriculum update. Under the EU AI Act, a Provider (the entity that develops and places the system on the market) bears primary responsibility for conformity assessment, technical documentation, CE marking, and instructions for use. A Deployer (the entity that uses the system in a professional context) has its own independent obligations: confirming documentation was received, using the system within its intended purpose, monitoring it in operation, and implementing human oversight measures. The hospital's failure to verify documentation and conduct its own review constitutes a Deployer obligation breach — but this does not transfer the Provider's conformity assessment duties to the hospital.

A
Incorrect. Deployers have explicit, independent obligations under the Act. Geographic origin of the vendor does not eliminate the hospital's duties as a Deployer operating within EU jurisdiction.
B
Incorrect. The hospital's location-based liability does not transfer the Provider's conformity obligations. Both parties hold distinct, non-transferable duties.
C
Correct. Accurately assigns primary conformity accountability to the Provider while identifying the hospital's independent Deployer obligations that were not met. This is the precise framing the exam rewards.
D
Incorrect and factually wrong. The EU AI Act explicitly distinguishes Provider and Deployer roles and assigns different obligations to each. "Equal accountability" is not a concept the Act employs.
Question 04 of 10 Domain II · NIST AI RMF · Apply

A governance lead at a logistics company is building an AI risk management programme from scratch. She begins by mapping all AI systems in operation, documenting the data they consume, the decisions they produce, and the stakeholders they affect. She then identifies the risk categories most relevant to each system — accuracy risk, fairness risk, security risk, and operational risk. She has not yet established monitoring protocols or mitigation controls.

Using the NIST AI Risk Management Framework, which functions has the governance lead completed, and which function should she address next?
  • A She has completed the Manage function; she should next address the Govern function to establish oversight structures.
  • B She has completed the Govern function; she should next address the Map function to contextualise AI risks.
  • C She has completed the Map function; she should next address the Measure function to assess the likelihood and magnitude of identified risks.
  • D She has completed the Measure function; she should next address the Manage function to implement controls.
Expert Explanation

The NIST AI RMF organises risk activities into four functions: Govern, Map, Measure, and Manage. Govern establishes organisational policies, roles, and accountability structures. Map identifies and classifies the AI system's context, risks, and affected stakeholders. Measure assesses the identified risks in terms of likelihood, severity, and priority. Manage implements controls to address those risks. The governance lead's activities — inventorying systems, documenting data flows, and categorising risk types — are Map activities. She has not yet assessed those risks quantitatively or qualitatively, which is the Measure function. Measure comes before Manage because you cannot prioritise controls without first assessing severity.

A
Manage involves implementing controls — far beyond what is described. The Govern function, not Manage, is foundational, and the stem describes contextual mapping, not control implementation.
B
A common ordering error. Govern does precede Map in sequence, but the activities described (inventorying systems, documenting data, categorising risk types) are definitionally Map activities, not Govern activities. Govern covers organisational policies and accountability, not system cataloguing.
C
Correct. The described activities are Map. Measure is the correct next function — assessing the prioritised likelihood and impact of the risks already identified before controls are designed.
D
Incorrectly identifies her current stage. Measure requires quantitative or qualitative assessment of risk severity; identifying risk categories is a prerequisite to Measure, not Measure itself.
Question 05 of 10 Domain II · ISO/IEC 42001 · Recall

A multinational corporation implements ISO/IEC 42001 as the foundation of its AI management system. During a gap assessment, the internal auditor identifies that the organisation has a robust policy for identifying AI risks, but lacks a formal process for determining which individuals and business units should be consulted when assessing the impact of a proposed AI system on external stakeholders such as customers and regulators.

Under ISO/IEC 42001, the organisation's gap most precisely relates to which requirement?
  • A Clause 6 — Planning, specifically the risk assessment process for identifying AI-related harms.
  • B Clause 4 — Context of the Organisation, specifically the requirement to identify and understand the needs and expectations of interested parties.
  • C Clause 9 — Performance Evaluation, specifically the internal audit requirements for reviewing governance processes.
  • D Clause 7 — Support, specifically the competence and awareness requirements for personnel involved in AI governance.
Expert Explanation

ISO/IEC 42001 Clause 4 — Context of the Organisation — requires an organisation to determine who its interested parties are (internal and external), understand what they need and expect from the AI management system, and document which of those needs are relevant to the AIMS scope. The gap described is specifically about the process for identifying which parties to consult — customers and regulators — which is the Clause 4 requirement, not a planning or risk assessment gap. Risk assessment (Clause 6) assumes you already know your interested parties; you cannot assess impact on stakeholders you have not identified.

A
Clause 6 risk assessment relies on Clause 4 stakeholder identification as a prerequisite. The gap is upstream — in identifying the parties at all — not in the risk assessment methodology.
B
Correct. Clause 4.2 requires documenting interested parties and their relevant needs. The absence of a formal process for determining who to consult is a Clause 4 gap.
C
Clause 9 performance evaluation and auditing occurs after the management system is operational. The gap described is foundational, not evaluative.
D
Clause 7 competence relates to the skills and knowledge of people managing the AI system. The gap is not about staff capability — it is about organisational process for stakeholder mapping.

Domain III — Governing AI Development

Question 06 of 10 Domain III · Development Lifecycle · Apply

A data science team is preparing a training dataset for a predictive maintenance model. The dataset combines sensor readings from industrial equipment, maintenance records, and failure logs collected over eight years. The governance team has been asked to review the dataset before model training begins.

During review, the governance team identifies that the failure logs from the first three years were collected under a different classification scheme — what was recorded as "minor fault" before 2017 would be classified as "critical fault" under current standards. The data science team proposes relabelling those records using an automated mapping algorithm before training.

What is the governance team's most appropriate response?
  • A Approve the automated relabelling, as it corrects a known inconsistency and improves data quality prior to training.
  • B Reject the dataset entirely and require the team to source new failure logs that use consistent classification standards throughout.
  • C Request that the relabelling methodology be documented, validated against a sample of original records by domain experts, and that the resulting dataset version be tracked separately with clear lineage notes before training proceeds.
  • D Instruct the team to exclude the pre-2017 records from training to eliminate the classification inconsistency.
Expert Explanation

This question tests data governance principles at the pre-training stage. The governance team's role is not to veto technical decisions but to ensure those decisions are made responsibly, documented, and auditable. The risk in automated relabelling is that the mapping algorithm may not perfectly reflect domain-expert judgement, and the transformation should be traceable. Option C — requiring documentation, expert validation on a sample, and dataset lineage tracking — is the governance-appropriate response: it enables the project to proceed while maintaining accountability for the transformation decision. Option A is inadequate because it accepts technical authority without governance oversight. Options B and D are both forms of avoidance that sacrifice eight years of data without attempting a defensible remediation.

A
Approval without documentation or validation is not governance — it is rubber-stamping. The governance team adds no value if it simply defers to the data science team's proposal.
B
A utopian answer. Rejecting eight years of real operational data is disproportionate when a documented remediation path is available. The exam penalises idealist responses that ignore operational feasibility.
C
Correct. Governance-appropriate: enables the project to proceed, requires accountability measures (documentation, expert validation, lineage tracking), and preserves auditability of the transformation decision.
D
Exclusion would eliminate the first three years of failure data, potentially degrading the model's ability to detect rare failure modes that only appeared in earlier equipment generations. This is a risk-avoidance decision masquerading as governance.
Question 07 of 10 Domain III · Bias & Fairness · Evaluate

An organisation is evaluating a completed AI model for resume screening. Post-training evaluation metrics show that the model achieves 91% accuracy overall. However, a disaggregated analysis reveals that the model's recall rate — the proportion of genuinely qualified candidates correctly identified — is 88% for male candidates and 71% for female candidates. The model's developer argues that the 91% overall accuracy demonstrates the system is fit for purpose. The organisation's documented AI risk appetite states that demographic performance differentials above 10 percentage points on any primary metric require escalation and remediation before deployment.

What is the governance team's most appropriate recommendation?
  • A Approve deployment. The 91% overall accuracy exceeds the industry benchmark and demonstrates that the model is sufficiently accurate for resume screening.
  • B Reject the model permanently. Differential performance of this magnitude in a hiring context constitutes illegal discrimination under most employment law frameworks and cannot be remediated.
  • C Escalate and require remediation before deployment. The 17-point recall differential between demographic groups exceeds the organisation's documented 10-point threshold, triggering the mandatory escalation and remediation requirement regardless of overall accuracy.
  • D Approve deployment with enhanced monitoring. Recall differentials are less material than precision differentials in screening contexts, and monitoring will capture any emerging bias in production.
Expert Explanation

The stem's key sentence is the risk appetite statement: a differential above 10 percentage points requires escalation and remediation before deployment. The observed differential is 17 points (88% minus 71%). This is a procedural question, not a statistical one. The organisation has pre-committed to a threshold, and that threshold has been exceeded. Option C is the only answer that respects the documented governance control. Options A and D both approve deployment in ways that contradict the stated risk appetite. Option B introduces a permanence ("rejected permanently") and a legal determination that the governance team is not positioned to make — and that also ignores the possibility of remediation.

A
Overall accuracy of 91% is a compelling number, but it is an aggregate metric that masks the disaggregated failure. The exam tests whether you recognise that aggregate metrics do not satisfy a disaggregated threshold requirement.
B
A common "utopian" distractor. The governance team cannot make legal determinations about illegality, and the permanence of rejection ignores the remediation pathway the risk appetite document explicitly includes.
C
Correct. Applies the documented risk appetite threshold directly. The governance team's role is to enforce pre-established controls, not to substitute its own judgment for the organisation's risk framework.
D
A technically sophisticated distractor. In some fairness literature, precision differentials carry more weight in certain contexts than recall differentials. However, the organisation's risk appetite does not make this distinction — it applies to "any primary metric." Post-deployment monitoring does not satisfy a pre-deployment threshold requirement.
Question 08 of 10 Domain III · Governance Artifacts · Recall

A governance lead is designing a standardised documentation framework for all AI systems the company develops internally. She wants a single-page artifact for each model that records the model's intended use, performance metrics, known limitations, evaluation datasets, and intended user population — and is designed to be readable by both technical and non-technical stakeholders.

Which governance artifact is she describing?
  • A A Fundamental Rights Impact Assessment (FRIA), used to evaluate the effect of an AI system on the fundamental rights of affected individuals.
  • B A Model Card, a standardised documentation artifact that summarises a model's purpose, performance characteristics, limitations, and appropriate use cases for diverse audiences.
  • C A System Card, a broader artifact that documents an entire AI system — including components, data pipelines, and integration architecture — rather than a single underlying model.
  • D A Data Sheet, which documents the composition, provenance, and intended use of a dataset rather than the model trained on it.
Expert Explanation

A Model Card is the artifact described. Developed as a transparency mechanism, it documents a model's intended purpose, performance benchmarks across different conditions and population subgroups, evaluation datasets, known limitations, and recommendations for appropriate use — in a format accessible to non-technical readers. The description is highly specific and matches Model Card conventions precisely. The other options are real governance artifacts but serve distinct purposes: FRIAs assess rights impacts of systems (not model-level documentation), System Cards cover full system architecture, and Data Sheets document datasets rather than models.

A
FRIAs are impact assessment frameworks, not summary documentation artifacts. They are forward-looking assessments, not descriptive records of existing model behaviour.
B
Correct. Model Card precisely matches every element in the description: single-page, model-level, covers intended use, performance, limitations, evaluation data, and user population, designed for mixed audiences.
C
A System Card is a plausible distractor for candidates who confuse model-level and system-level documentation. The stem specifies a "model" artifact — not the full system. System Cards document components, pipelines, and integration dependencies beyond the model itself.
D
A Data Sheet (or Datasheet for Datasets) documents the dataset — its source, composition, collection process, and intended uses. It is a complement to a Model Card, not a substitute for it.

Domain IV — Governing AI in Deployment

Question 09 of 10 Domain IV · Monitoring & Incident Response · Evaluate

An insurance company's AI model for claims processing has been in production for fourteen months. The model automatically approves standard claims within seconds. During a routine audit, the monitoring team discovers that over the past six weeks, the model's approval rate for a specific claim category has dropped from 74% to 41% — a shift not explained by any documented change in claims patterns, policy rules, or model parameters. No customer complaints have been filed. The model's output logs show normal confidence scores throughout the period.

What is the governance team's most appropriate immediate response?
  • A Take no immediate action. No customer complaints have been filed, and normal confidence scores confirm the model is operating as intended.
  • B Notify affected customers immediately and issue manual reviews of all denied claims from the past six weeks.
  • C Escalate to an incident investigation: suspend automated decision-making for the affected claim category, route those decisions to human reviewers, and initiate a root cause analysis before determining remediation steps.
  • D Retrain the model on the most recent six weeks of claims data to recalibrate approval rates to historical norms.
Expert Explanation

The scenario describes an unexplained material performance shift: a 33-percentage-point change in approval rate with no documented cause. The absence of customer complaints is not exculpatory — it may reflect that denied customers did not know they had grounds to complain, or had not yet appealed. Confidence scores appearing normal can itself be a signal of model failure (a miscalibrated model can output high confidence on incorrect outputs). The governance-appropriate response is to escalate to investigation, protect affected individuals by routing to humans during the investigation, and determine root cause before any remediation. Retraining without root cause analysis (option D) risks amplifying whatever underlying problem caused the shift.

A
A critical governance reasoning error. Absence of complaints is not a valid substitute for monitoring. Normal confidence scores from a potentially degraded model are not confirmation of correct operation — they are a flag for deeper investigation.
B
Customer notification may eventually be appropriate, but it should follow root cause analysis, not precede it. Issuing blanket manual reviews before understanding the cause is operationally premature and may not be legally required at this stage.
C
Correct. The sequence — suspend, protect, investigate — is the governance-standard incident response. Human routing protects affected individuals while the investigation is underway, preventing further harm from the unexplained failure.
D
Retraining on recent data without root cause analysis would encode the anomalous behaviour if it reflects a data problem, or mask a model degradation that requires a different fix. This is a technical intervention that jumps ahead of the governance investigation step.
Question 10 of 10 Domain IV · Agentic AI · Evaluate

A technology company deploys an agentic AI system that autonomously manages supplier contract renewals. The system can access internal financial data, send binding emails on behalf of procurement officers, and approve purchase orders up to $100,000 without human sign-off. The system completes tasks across multiple sequential steps, constructing its own plan of action before executing. Three months after deployment, the system renews a supplier contract at a rate 22% above market value without escalating to a human for approval, citing its authority to approve contracts up to $100,000.

Which governance gap most directly caused this outcome?
  • A Insufficient model accuracy. A more capable model would have identified the above-market rate and escalated appropriately.
  • B Inadequate monitoring. Post-deployment monitoring would have detected the anomalous contract value before the renewal was executed.
  • C Absent escalation triggers. The system's authority definition was scoped solely by transaction value, without escalation rules for contextual anomalies such as significant deviation from market pricing — a governance design gap at the pre-deployment stage.
  • D Excessive system autonomy. Agentic systems should never be authorised to send binding communications or approve financial transactions without per-action human approval.
Expert Explanation

This question tests the governance design requirements specific to agentic AI — the BoK v2.1 addition that reflects the shift from static models to autonomous, multi-step, goal-directed systems. The system acted within its technically-defined authority (the transaction was under $100,000). The governance failure was not model capability, not monitoring lag, and not the existence of autonomy itself — it was the design of the authority boundary. A well-governed agentic system requires escalation triggers not just for dollar value, but for contextual anomalies (material deviation from market price is an obvious candidate). This is a pre-deployment governance architecture failure, not a post-deployment monitoring failure. Option D is a "utopian" answer that would eliminate the operational value of agentic systems entirely — the exam does not reward blanket prohibition when targeted governance controls exist.

A
Model accuracy is a distractor for technically-minded candidates. The system did not fail because it lacked the capability to identify the market deviation — it failed because no governance rule required it to act on that information. This is a policy gap, not a capability gap.
B
Post-deployment monitoring would detect the outcome, but the stem states the system executed the renewal — the harm has already occurred. The question asks about the cause, not the detection mechanism. Better monitoring would observe the problem; it would not have prevented it.
C
Correct. The system's authority was defined by a single dimension (dollar value) without contextual escalation triggers. This is the pre-deployment governance design gap that directly produced the outcome. Agentic systems require multi-dimensional authority boundaries.
D
The canonical "utopian answer." Per-action human approval for every binding communication would eliminate the operational purpose of the agentic system entirely. The BoK does not endorse blanket prohibition — it calls for proportionate governance controls calibrated to risk. This answer ignores the risk-managed deployment principle the exam rewards.

How to Interpret Your Score

Use the table below to assess your readiness across domains. The passing scaled score on the AIGP is 300 on a 200–400 scale, which corresponds roughly to 70% correct on the full 100-question exam. For practice sets, aim higher — 75%+ across multiple question sets before scheduling.

Score Signal Recommended Action
9–10 / 10 Strong readiness Expand to full-length timed mock exams. Focus on weak domains.
7–8 / 10 On track Review distractor analyses for missed questions. Drill your weakest domain with additional scenario questions.
5–6 / 10 Foundational gaps remain Return to the BoK sections for domains where you missed. Study the governance mindset — prioritise scenario-based review over re-reading frameworks.
0–4 / 10 Material preparation needed Do not schedule the exam yet. Start with Domain I and II foundations, build vocabulary from the IAPP Glossary, then return to scenario practice.