JOURNAL ARTICLE

Correcting Selection Bias in Big Data by Pseudo-Weighting.

Published In: Journal of Survey Statistics & Methodology, 2023, v. 11, n. 5. P. 1181 1 of 3
Database: Academic Search Ultimate 2 of 3
Authored By: Liu, An-Chiao; Scholtus, Sander; Waal, Ton De 3 of 3

Abstract

This article focuses on extending the pseudo-weight estimation method originally proposed by Elliott and Valliant (EV) to correct selection bias in nonprobability samples by leveraging a two-sample setup involving both probability and nonprobability samples drawn from the same population. The extension relaxes the original assumption that the two samples do not overlap, allowing for large sampling fractions and potential overlap between samples, which are common in administrative and Big Data contexts. The authors propose new estimators for inclusion probabilities that account for possible dependency between the samples and introduce pseudo population bootstrap algorithms for variance estimation applicable to a wide range of sampling designs and propensity models. A simulation study based on Dutch vehicle registration data demonstrates that the proposed methods outperform existing approaches in bias and root mean square error, particularly when accounting for sample dependency, and provide reliable variance estimates. Limitations include assumptions about sample quality, availability of design variables or unique identifiers, and challenges in identifying overlapping units, with suggestions for practical alternatives and further improvements discussed.

Additional Information

Source:Journal of Survey Statistics & Methodology. 2023/11, Vol. 11, Issue 5, p1181
Document Type:Article
Subject Area:Social Sciences and Humanities
Publication Date:2023
ISSN:2325-0984
DOI:10.1093/jssam/smac029
Accession Number:173631831
Copyright Statement:Copyright of Journal of Survey Statistics & Methodology is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Looking to go deeper into this topic? Look for more articles on EBSCOhost.