Via [WayBack] Artikel 13 (Uploadfilter) vs. Math – Math wins – Kristian Köhntopp – Google+:
Simulation of the proposed law effects are easy: [WayBack] Thread by @AlecMuffett: “Regards Article13, I wrote up a little command-line false-positive emulator; it tests 10 million events with a test (for copyrighted material) […]” #Article13
What it shows that an automated test for content-originality only succeeds when there are a truckload of copyrighted-material uploads than original-content uploads:
about 1 in 67 postings have to be “bad” in order to break even
So if you have less than 1% false uploads, even with a 98.5% accuracy (which is very very good for a take-down algorithm!), you will piss off far more good items wrongly marked as false positive, than bad items correctly marked bad.
When the accuracy gets less, you piss-off far more original-content uploads, but also catch less copyrighted-material uploads..
This is called the a far less “sexy” term False positive paradox – Wikipedia, which is a specialisation of the far mor dull sounding Base rate fallacy – Wikipedia
Source code: [WayBack] random-code-samples/falsepos.py at master · alecmuffett/random-code-samples · GitHub
Original thread:
[WayBack] Alec Muffett on Twitterさん: “Regards #Article13, I wrote up a little command-line false-positive emulator; it tests 10 million events with a test (for copyrighted material, abusive material, whatever) that is 99.5% accurate, with a rate of 1-in-10,000 items actually being bad.… https://t.co/CJvxdvkiom”
and
[WayBack] next_ghost on Twitter: “And for the nerds who want to learn more, this is called a “False positive paradox”. https://t.co/CIvw2ni21q… “
–jeroen