The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

GitHub: finding the oldest commit on large repositories

Posted by jpluimers on 2025/06/25

The manual process of getting back to the earliest commit of a GitHub repository is easy for small repositories, but for a large one it is very tedious.

TL;DR: there are various ways, but the easiest was the INIT Bookmarklet below.

Note: 2 weeks before the scheduled post made it to the front of the queue, I got a report¹ that it started to fail. Here it still works.

It’s hard to debug because of the functional programming approach taken.

A few of the repositories I tried this on (all returning the first commit within seconds):

The reason that INIT works is that it is based on the GitHub API. Virtually all other Bookmarklets are based on the modern version of old Screen Scraping: Web Scraping. They fail over time because of the same reason Screen Scraping was so fragile: the underlying UI changes, then breaks the scraping mechanism.

Not having enough JavaScript debugging knowledge, the only drawback for me of the INIT approach is that it uses functional programming: besides adding logging statements I have no idea how to debug it, so when it fails, it is tough¹.

The query that led me to this quest was [Wayback/Archive] github find initial commit of repository – Google Search

Solutions I tried:

  • Manual cloning, then git log --reverse based solutions: time consuming
  • Using the Insights link: it takes forever to load the histogram on large repositories.
  • UI based scraping bookmarklets – they all fail over time and most are unmaintained
  • The INIT bookmarklet which is based on the Google API

INIT Bookmarklet links:

The below GreasyFork Greasemonkey userscript has a solution very similar to the above INIT one:

Things I tried were based on:

  • [Wayback/Archive] github – How to navigate to the earliest commit in a repository? – Stack Overflow (thanks [Wayback/Archive] jdw, [Wayback/Archive] Stevoisiak and [Wayback/Archive] cachius)

    Is there an easy way to navigate to the earliest commit of a large open-source project in GitHub?

    The project has over 13,000 commits as of today. I don’t want to press the “Older” button on the commit history page hundreds and hundreds of times to get to the initial commit (or first commit).

    has all kinds of manual solutions, most depending on

    git log --reverse

    and many bumping in it’s limitations (like it not liking other arguments causing unreliable output) requiring piping output through other tools.

    Some quotes:

    • Clone the repository, open with the command line and run $ git log --reverseThis will show the commits in reverse order.Then you can view it on github once you have the ID(Object Name) of the first commit … something like… https://github.com/UserName/Repo/commit/6a5ace7b941120db5d2d50af6321770ddad4779e
    • I got weird results when I tried to get just the first commit of a repo with git log --reverse -n 1. Well not weird, it just ignored the --reverse flag on git 2.46.2
    • ² @andrewarchi Nicely done. Actually the first few commits are easter-eggs: stackoverflow.com/a/21981037/6309
    • On project page click tab Insights > Contributors. Select the beginning of the histogram. Click the commits number. Done.

    Mentioned Bookmarklets or derivatives that do not function any more

  • [Wayback/Archive] How do I find the date of the first commit in a GitHub repository? – Web Applications Stack Exchange has similar answers suggesting git log --reverse and the Insights link (with similar results: failing, slow or tedious) with one exception: the GitHub API has a very easy way to get the creation date of a github repository. That usually is close to the first commit:

    A quick technique that worked for me was to just use curl with the GitHub API to determine the repo’s creation date. It won’t be as accurate as the first commit date for forks or other situations where the project was started before the repo was created (e.g. started on GitLab and then imported).

    The syntax I used for this was: curl -s https://api.github.com/repos/[username]/[repository-name] | jq '.created_at'.

    So I commented this:

    Note this is the creation timestamp of the repository, not the first commit. Usually these are not far apart, but they are distinctly different. For instance github.com/dotnet/runtime 2019-09-24T23:36:39Z, the first commit is github.com/dotnet/runtime/commit/480e91e54517a0fee2e64a9a24dc8319dad03186 which got created at 2001-06-19T03:37:46Z. There is only about a minute between them, but still.

    Because of this query:

    curl -s https://api.github.com/repos/dotnet/runtime/commits/480e91e54517a0fee2e64a9a24dc8319dad03186 | jq '.commit.committer.date'

“Footnotes”

¹ [Wayback/Archive] @JeroenWiertPluimers The ‘Init’ bookmarklet by FarhadG broke shortly after your comment and is still broken as of June 2025. –  by [Wayback/Archive] cachius

  1. [Wayback/Archive] github – How to navigate to the earliest commit in a repository? – Stack Overflow – answer with INIT bookmarklet by Stevoisiak (thanks [Wayback/Archive] Stevoisiak)
  2. [Wayback/Archive] Not working · Issue #18 · FarhadG/init where I mentioned my debugging problems.

If that gist keeps failing, I should try these answers from the same question:

  1. [Wayback/Archive] github – How to navigate to the earliest commit in a repository? – Stack Overflow – answer by Tuan Anh Tran (thanks [Wayback/Archive] Tuan Anh Tran) which uses the API based approach and does some logging:

    You can use this gist to create a bookmarklet.

    The code

    async function goToFirstCommit(owner, repo) {
        const headers = {
            "Accept": "application/vnd.github.v3+json",
            // Optional: Add authorization if you hit the rate limit (use a personal access token)
            // "Authorization": "Bearer YOUR_GITHUB_TOKEN"
        };
    
        const repoInfo = await fetch(`https://api.github.com/repos/${owner}/${repo}`, { headers });
        const repoData = await repoInfo.json();
        const defaultBranch = repoData.default_branch;
    
        const branchInfo = await fetch(`https://api.github.com/repos/${owner}/${repo}/branches/${defaultBranch}`, { headers });
        const branchData = await branchInfo.json();
        const latestCommitSha = branchData.commit.sha;
    
        // Step 2: Get the total number of commits in the default branch using the API for commits count
        const commitsInfo = await fetch(`https://api.github.com/repos/${owner}/${repo}/commits?sha=${defaultBranch}&per_page=1`, { headers });
        const linkHeader = commitsInfo.headers.get("link");
    
        // The "last" page link in the Link header contains the total number of commits
        let totalCommits;
        if (linkHeader) {
            const lastPageMatch = linkHeader.match(/&page=(\d+)>; rel="last"/);
            totalCommits = lastPageMatch ? lastPageMatch[1] : 1;
        } else {
            // If there's no pagination, it means there is only one page of commits
            totalCommits = 1;
        }
    
        console.log(`Total commits in ${defaultBranch} branch: ${totalCommits}`);
    
        if (totalCommits != 1) {
            const firstCommitURL = `https://github.com/${owner}/${repo}/commits/${defaultBranch}/?after=${latestCommitSha}+${totalCommits - 36}`
            console.log(`First commit URL is ${firstCommitURL}`)
            window.location.href = firstCommitURL;
        }
    }
    
    async function main() {
        const currentUrl = window.location.href;
    
        const regex = /https:\/\/github\.com\/([^\/]+)\/([^\/]+)/;
        const match = currentUrl.match(regex);
    
        if (match) {
            const owner = match[1];
            const repo = match[2];
    
            console.log(`Owner: ${owner}`);
            console.log(`Repository: ${repo}`);
    
            await goToFirstCommit(owner, repo);
        } else {
            console.log("Not on a GitHub repository page.");
        }
    }
    
    main()
    

    It only works for public repo for now since I use github api to find latest sha & total commits.

    Use this website to create bookmarklet

    Links:

    1. [Wayback/Archive] Go to first commit of GitHub repo · GitHub
    2. [Wayback/Archive] Bookmarklet Maker
  2. [Wayback/Archive] github – How to navigate to the earliest commit in a repository? – Stack Overflow – answer by Dwza (thanks [Wayback/Archive] Dwza)

    I don’t know if this just possible nowadays but you can just call the correct link :)

    Repo

    https://github.com/{owner}/{repo}/commits
    

    This should actually list all commits, starting with the latest (or earliest… it’s the most recent).

    For example, lets take shopware – See repo commits

    https://github.com/shopware/shopware/commits
    

    Specific Branch

    This also works for specific branches e.g.

    https://github.com/{owner}/{repo}/commits/{branch}
    

    Staying with shopware – See branch specific commits

    https://github.com/shopware/shopware/commits/6.5.x
    

    So there are no needs of any kind of scripts, crazy long functions or so.

    Links:

    1. [Wayback/Archive] https://github.com/shopware/shopware/commits
    2. [Wayback/Archive] https://github.com/shopware/shopware/commits/6.5.x

    First commit page and first commit (which is identical for the main branch, in this case trunk, and the branch 6.5.x):

² Early Go programming language commits: [Wayback/Archive] mercurial – What’s the story behind the revision history of Go? – Stack Overflow with a few puns starring Brian Kernighan, the C programming language, and ANSI C.

--jeroen

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.