sed: convert Google Drive urls to direct download ones
Posted by jpluimers on 2017/03/14
One of the things after moving most of my things from copy.com to Google Drive was the direct (public) download URLs that copy.com provides. DropBox has them as well, but Google Drive lacks them in the UI.
There is a URL format that does allow for direct download though:
While Google aims for Drive to be a competent Dropbox competitor, there’s one small but key feature that isn’t easy: sharing direct download links. Fortunately, you can create your own.
You can do a similar replacement for Google Doc URLs: How to Create Direct Download Links for Files on Google Drive
The Google Drive conversion seems straightforward as they convert from either of
There are tons of RegEx examples for doing the first conversion at Regex to modify Google Drive shared file URL – Stack Overflow, but
- they don’t cover the two conversions
- they use the non-greedy (.*?) capturing groups which are tricky, introduce question mark escaping issues in hash and many sed implementations fail to implement non-greedy
Since I’m a command-line person, I’ve opted for a sed conversion that wasn’t in the above list. I choose sed because it allows you to convert either a line or a complete file at one time.
There are a few indispensable resources to get my regex expressions right:
- For getting sed expressions right: USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor).
- The Stack Overflow Regular Expressions FAQ.
- RegexOne – Learn Regular Expressions – Lesson 1: An Introduction, and the ABCs
- Sed – An Introduction and Tutorial.
So here it goes, starting with fixing
https://drive.google.com/open?id=FILE_ID as it’s the most simple replacement because the FILE_ID is at the end.
First of all, these code fragments below are part of bash functions as bash functions remove the quoting hell you have with bash aliases.
Where bash aliases have no parameters (i.e. the arguments are put after the end of the expansion), functions have parameters. So if you want to pass all function parameters to a command inside a function, you have to use “$@” to pass all parameters.
sed -n 's@https://drive.google.com/open?id=@https://drive.google.com/uc?export=download\&id=@p' "$@"
A few remarks:
- you have to
\&as the ampersand has a special meaning in the replacement part of the sed expression: the “match” itself.
- note the use of @ as regex delimiter as otherwise you will end up in / escape hell.
The second fragment fixes
https://drive.google.com/file/d/FILE_ID/view again printing each fix:
sed -n 's@https://drive.google.com/file/d/\([^.]*\)/.*@https://drive.google.com/uc?export=download\&id=\1@p' "$@"
Some more remarks:
- The FILE_ID is obtained from a capturing group during the match using
\([^.]*\)and using the value in the replace with
- There is backslash escaping of the parentheses because that’s the sed way.
- I’ve used a non-greedy
\(.*?\)capturing group (sed can’t do that) but
\([^.]*\)/which matches any non-slash inside the capturing group until the first slash outside that group.
The final part is combing both replacement into one sed command:
sed 's@https://drive.google.com/open?id=@https://drive.google.com/uc?export=download\&id=@;s@https://drive.google.com/file/d/\([^.]*\)/.*@https://drive.google.com/uc?export=download\&id=\1@' "$@"
- I removed both the –n (not printing anything) and the p (print only matches) which means everything is printed: non matching lines without replacement and matching lines with replacement.
- The semicolon allows you to combine multiple the sed expressions.