- Counterfactual Debiasing for Fact Verification
579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models Unlike existing works, CLEVER is augmentation-free and mitigates biases on infer- ence stage In CLEVER, the claim-evidence fusion model and the claim-only model are independently trained to capture the corresponding information
- Leaving the barn door open for Clever Hans: Simple features predict. . .
This phenomenon, widely known in human and animal experiments, is often referred to as the 'Clever Hans' effect, where tasks are solved using spurious cues, often involving much simpler processes than those putatively assessed Previous research suggests that language models can exhibit this behaviour as well
- EVALUATING THE ROBUSTNESS OF NEURAL NET : A E VALUE THEORY APPROACH
te the CLEVER scores for the same set of images and attack targets To the best of our knowledge, CLEVER is the first attack-independent robustness score that is capable of handling the large networks studied in this paper, so we directly r `2 and `1 norms, and Figure 4 visualizes the results for `1 norm Similarly, Table 2 comp
- From Control Application to Control Logic: PLC Decompile Framework. . .
To address the challenge, we propose a PLC decompile framework named CLEVER, which can analyze the control application and extract the control logic First, we propose a simulation execution based code extraction method, which is utilized to filter the control logic related data
- Leaving the barn door open for Clever Hans: Simple features predict. . .
This paper focuses on exploring the "Clever Hans" effect, also known as the "shortcut learning" effect, in which the trained model exploits simple and superficial correlations instead of intended capabilities to solve the evaluation tasks
- On the Planning Abilities of Large Language Models : A Critical . . .
While, as we mentioned earlier, there can be thorny “clever hans” issues about humans prompting LLMs, an automated verifier mechanically backprompting the LLM doesn’t suffer from these We tested this setup on a subset of the failed instances in the one-shot natural language prompt configuration using GPT-4, given its larger context window
- Submissions | OpenReview
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers Lorenzo Pacchiardi, Marko Tesic, Lucy G Cheke, Jose Hernandez-Orallo 27 Sept 2024 (modified: 05 Feb 2025) Submitted to ICLR 2025 Readers: Everyone
- Weakly-Supervised Affordance Grounding Guided by Part-Level. . .
In this work, we focus on the task of weakly supervised affordance grounding, where a model is trained to identify affordance regions on objects using human-object interaction images and egocentric
|