Recently, GitHub introduced the change in how atx
headers are parsed in Markdown files.
##Wrong
While this change follows the spec, it breaks many existing repositories. I took the
README dataset which we created
at source{d} and ran a simple
regexp PySpark job.
It appeared that more than 500,000 repositories have README files which are rendered
with invalid headers.
Among those 0.5mm, there are more than 10,000 which have more than 50 stars. They were
uploaded to data.world.






Leave a comment