Purpose: downloads and formats data about a YouTube channel's videos.
Writes outputs to files in the working directory: - ./channel_full.json - all data extracted by youtube-dl - ./channel_wof.json - same, but cleaned up, with formats and captions removed - ./channel_hr.tsv - tab-separated-values to describe each video - ./channel_page.md - nice presentation of the video list, in markdown Depends on yt-dlp and jq.
2021-230 initial development
- emojis in markdown output
- main heading in markdown output
- faster markdown generation
2021-359 update to handle the absence of dislike data
- use yt-dlp instead of youtube-dl
- fix condition syntax
- fix markdown output
2023-236 reformat/redocument a tad
Source code (perhaps slightly corrupted) is as follows.
if [ -z "$1" ] then echo `basename "$0"`: error: no channel URL given >$source2 exit 1 fi if [ ! -s channel_full.json ] then echo "Downloading channel data ..." >$source2 if yt-dlp -j --max-filesize 0k "$1" >channel_full.json then : else echo `basename "$0"`: error: could not download channel data fi fi if [ -s channel_full.json ] $source$source [ ! -s channel_wof.json ] then echo "Cleaning JSON ..." >$source2 # .formats, .requested_formats, .automatic_captions, .subtitles take up # the majority of the data-size, and can be rederived # by yt-dlp when necessary. # channel_full.json might also contain duplicates, and unique_by(.id) # messes up the date-sorting. jq -c 'del(.["formats", "requested_formats", "automatic_captions", "subtitles"])' \n
channel_wof.json fi if [ -s channel_wof.json ] $source$source [ ! -s channel_hr.tsv ] then echo "Generating TSV ..." >$source2 # extract a bunch of fields, reformat them as necessary, # and write them as tab-separated values jq -r 'include "timefmt"; [.webpage_url, "https://i.ytimg.com/vi/" + .id + "/mqdefault.jpg", .id, (.upload_date | hdate), (.duration | csts), .view_count, .like_count, .fulltitle, (.description | gsub(" "; "") | gsub("(?[a-z]+:[^ ]+)"; "<(.a)>"))] | @tsv' \n channel_hr.tsv fi if [ -s channel_hr.tsv ] $source$source [ ! -s channel_page.md ] then echo "Generating Markdown ..." >$source2 # this is such a nasty hack. i'm sorry. IFS=`echo -e ' '` # some videos in the json dumps will not have .channel == REAL_CHANNEL_NAME # and that may be the case for the most recent one # however, all channels on youtube (for now) have # a "REAL_CHANNEL_NAME - Videos" playlist, so this *should* give # the right channel name. if not, uh, sorry? CHANNAME=`jq -rs 'limit(1; . | select(.playlist == .channel + " - Videos")) | .channel' channel_page.md cat channel_hr.tsv | \n while read -r VIDURL TNURL VIDID ULDATE VDUR VIEWC LIKES TITLE VDESC do # todo: we may want to sanitise video titles and descriptions echo -e " ## [$TITLE][$VIDID] [![Thumbnail]($TNURL)][$VIDID]" echo -e " ### 📅 Uploaded $ULDATE | 🕐 Duration $VDUR | 👀 $VIEWC views" \n "| 👍 $LIKES likes / 👎 dislike count unavailable" echo -e " [$VIDID]: $VIDURL" echo -e "$VDESC" done >>channel_page.md fi echo "Done!" >$source2
All scripts | dkl9 home