ScreenIT Quality Assurance Results
Comparison between COVID preprint analyses with ScreenIT before and after update
The original ScreenIT pipeline continuously screened COVID preprints from January 2020 to June 2022. Sometime in March 2021 a big change in the code of the pipeline occured introducing new categories and changing the output schema. Therefore the data set needs to be rescreened using the same version of the pipeline. This report aims to perform a quality assurance on the updated version of the pipeline before the whole data set is screened with it. Data prior to the update were taken from the latest database. The updated version screenings were done via a different API that only sees the PDFs and no meta-data from the preprint servers. Two hundred preprints were randomly selected for the comparison, but due to bugs in the pipeline, four preprints could not be screened with the updated pipeline, resulting in a total of 196 screened preprints.
Sciscore Results
Compared to the previous version, the updated version more often delivered “not required” for the ethics statement, for example for modeling papers. However, this affected all downstream analyses as well, even in the case when there were actually statements related to e.g. randomization, attrition, etc. (Figure 1). In addition, a couple of funding statements were incorrectly detected as ethics statements, attrition had a few false positives, blinding a few false negatives, and power analysis a couple of false positives and one false negative.
rtransparent Results
A full 100% of preprints in the data set had conflict of interest statements and funding statements, however only some had included these in the pdf of the manuscript (Figure 2). As the updated version screened only the pdf input, the manual assessment also was based only on the text in the pdf. The updated version of the pipeline missed coi statements if they were named with non-standard names (e.g. “conflicts:” or without section title) or if they were given on the first page of the manuscript. A check of the extracted text in the pipeline container under /temp/all_text/ showed that some papers were missing the first page of text. This omission also affected other metrics where the relevant text was on the first page.
The updated version of the pipeline also missed funding statements if they were named with non-standard names (e.g. “financial disclosure”, “financing”, “funding/support”, etc.) or if the funding information was in the acknowledgements or on the first page of the pdf (see previous paragraph).
The updated version of the pipeline did not detect registration numbers in 16 cases where the previous version did. The majority (14) of those were correct calls, with the exception of two cases where a PROSPERO registration number was cited but missed by the updated pipeline version.
limitation-recognizer Results
There were only three discrepancies between the previous and updated pipeline versions (Figure 3). In all three the updated version caught limitations that the previous version did not.
TrialIdentifier Results
The updated version of TrialIdentifier yielded several false positives (Figure 4). Some of these were grant numbers or accession numbers given in supplemental tables, while the majority were EUDRA numbers falsely detected from dois in the reference section ( Figure 5).
JetFighter Results
The updated JetFighter version detected seven papers that the previous version did not (Figure 6). In addition, it falsely detected the fluorescent microscopy image shown in Figure 7.
Barzooka Results
In addition to the comparison of the previous and updated version of the pipeline (for all tools listed above), we also compared the performance of Barzooka based on two different types of input: the individually extracted image files during pipeline processing vs. a folder of pdfs with the same preprints. Thus, the main difference was on the level of analysis, either image-based (Barzooka in pipeline) vs. page-based (stand-alone Barzooka). Two hundred papers were screened with either Barzooka version (pipeline vs. stand-alone) and the cases where there were discrepancies between the two versions were manually validated.
Discrepancies between the two Barzooka versions on the presence or absence of a figure type were detected in 103 out of 200 papers, with discrepancies found for all figure types (Figure 8).
For most categories, especially “approp”, “bardot”, “dot”, and “pie”, the stand-alone version generally delivered better results (Figure 8). Thus, this is the recommended use of the tool and the application on extracted separate image files is to be avoided. For the stand-alone version, the occasional errors in the “bar” and “approp” categories were due to proportional data not recognized as such or histograms or bardots were miss-classified. The stand-alone version also more readily detected “hist” images compared to the pipeline version, although some of these detections were false positives. There were no false negatives and only a few false positives for the “dot” and “bardot” categories, with commonly missidentified dots with whiskers, scatter plots or “bardots” with barely any bars visible. Similarly, many densely-packed dotplots or boxplots were mistaken for “violin” plots. Finally, there were several gene structure schematics and symbol-whisker plots with large squares that were mistakenly classified as “box”.
Taking a closer look at some of the images extracted in the pipeline container under /temp/images revealed a number of issues that would explain the discrepant results in the above comparison. First, some figures were not extracted at all and were therefore never screened. Second, images were frequently extracted without the text layer (Figure 9), which may contain crucial information such as whether the y axis is displaying counts or proportions. Third, images were sometimes extracted from layers or subsections (e.g. only the legend) of the original figure, resulting in several incomplete pieces of the figure (Figure 10). Finally, some extraneous non-figure images such as logos were extracted as well.
Acknowledgements
Nico Riedel
Peter Eckmann
Anita Bandrowski
Robert Schulz
Parya Abbasi