Downloading a Podcast with PowerShell

2 Comments

Back in 2012, I wrote a post about how to scrape a podcast (or any web page for that matter) from the Internet. In that post my idea was to prove a point that it is possible to scrape the information from web pages with PowerShell. However, if you’re trying to download all the posts, I’d suggest you to use a better strategy which is to use the podcast’s RSS feed. In this post, I’m going to show you just that…

Just note that I’m using the same podcast I used back then to point out the differences. Link to the Hanselminutes podcast (which is one of my favorites) is this:

http://feeds.feedburner.com/HanselminutesCompleteMP3

In PowerShell you can use any of the .Net libraries. Combining this and the PowerShell’s pipeline feature makes shell programming a heaven!  In just one line, you will be able to download the XML of the RSS:

$xml = $a = ([xml](new-object net.webclient).DownloadString($feedURL));

How cool is that? Now, PowerShell lets you easily parse the XML with just this:

$xml.rss.channel.item

And you can use pipeline to parse each item:

$xml = $a = ([xml](new-object net.webclient).DownloadString($feedURL)); 
$xml.rss.channel.item | foreach{ 
    //do something 
}

For each item, you can get the item’s data using the special variable named: $_ which points to current item in the loop. For instance:

$itemURL = (New-Object System.Uri($_.enclosure.url)); 
$title = $_.title;

One more thing, downloading the file would be just as easy:

(New-Object System.Net.WebClient).DownloadFile($itemURL, $filePath)

Finally, since we’re going to use podcasts’ titles as the file name (to have a neat collection), we may encounter some issues if the titles contain illegal character for file names (for instance ‘:’). To avoid such issues, we write a simple function to remove those illegal characters using Regular Expressions.

$invalidChars = [System.IO.Path]::GetInvalidFileNameChars(); 
$invalids = New-Object System.String($invalidChars,0,$invalidChars.Length) 
$regex = New-Object Regex([System.String]::Format("[{0}]", [Regex]::Escape($invalids)));

function RemoveIllegal($str){ 
    return $regex.Replace($str, "_"); 
}

Annnnnddd, here’s the whole code. Hopefully, you’ll enjoy it:

$feedURL = "http://feeds.feedburner.com/HanselminutesCompleteMP3"; 
$outputDirectory = "C:\podcast";

$invalidChars = [System.IO.Path]::GetInvalidFileNameChars(); 
$invalids = New-Object System.String($invalidChars,0,$invalidChars.Length) 
$regex = New-Object Regex([System.String]::Format("[{0}]", [Regex]::Escape($invalids)));

function RemoveIllegal($str){ 
    return $regex.Replace($str, "_"); 
}

New-Item -ItemType Directory -Force -Path $outputDirectory


$xml = ([[xml]](new-object net.webclient).DownloadString($feedURL));

$i = $xml.rss.channel.item.Count

$xml.rss.channel.item | foreach{    
    $itemURL = (New-Object System.Uri($_.enclosure.url)); 
    $fileName = RemoveIllegal("$i. " + $_.title + ".mp3"); 
    $filePath = Join-Path $outputDirectory $fileName 
    $i--;

    (New-Object System.Net.WebClient).DownloadFile($itemURL, $filePath) 
}

Comments

Comment by Sirwan
thanks my friend, it works, it seems that powershell script is case-sensitive, my podcast folder was "Podcast", I fixed that to "podcast" then it worked. thanks
Sirwan
Comment by Alireza Noori

Hmmmm. I haven't had this problem yet. Thanks for sharing. I'm glad it worked.