Capture m3u8 Links and Download Videos
I wanted to download some embedded videos from a website and wrote a small script in JavaScript with the help of an AI. The AI, in this case ChatGPT, was meant to help me become more familiar with JavaScript, as I’d only had a superficial acquaintance with it.
I can say that you need to be very specific about what you want and check the code very carefully. The longer the conversation or the more functions you put in at once, the more likely it is that the AI will get confused at some point. But for me, it’s a good way to get back into programming and a good way to learn a new programming language. Of course, checking the code against the official documentation is a bit time-consuming at first, but I want to understand the code, and I was able to learn a lot.
As always: Don’t run code or commands you don’t understand.
I haven’t had a chance to test any special programming AIs yet, but it’s on my to-do list, especially in my local Ollama environment.
How it works
The script uses Puppeteer to act as a browser and pull the m3u8 addresses from the network responses. We then collect these and output the number of links. Next, we use youtube-dl to try to figure out the metadata of the videos and therefore the quality. Then the best possible quality is downloaded and saved with a unique file name.
Installing the prerequisite
The script has been successfully tested with Node.js v12.22.9 and npm v10.7.0.
sudo apt install nodejs npm
npm install puppeteer youtube-dl
Execution
In the script, simply replace the URL www.example.com with the website you want. Then run the script as follows:
node capture_m3u8.js
Code
const puppeteer = require('puppeteer');
const { exec } = require('child_process');
const path = require('path');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const m3u8Links = [];
// Listen to all network requests
page.on('response', async (response) => {
const url = response.url();
if (url.endsWith('.m3u8')) {
m3u8Links.push(url);
}
});
// Navigate to the target website
await page.goto('http://example.com', { waitUntil: 'networkidle2' });
// Workaround for page.waitForTimeout
await new Promise(resolve => setTimeout(resolve, 5000));
await browser.close();
// Print the number of links found
const totalLinks = m3u8Links.length;
console.log(`Found ${totalLinks} .m3u8 links`);
// Helper function to download a link with youtube-dl
const downloadLink = (url, index) => {
return new Promise((resolve, reject) => {
// Fetch video metadata to get the format and quality information
exec(`youtube-dl -J "${url}"`, (error, stdout, stderr) => {
if (error) {
console.error(`Error fetching metadata for ${url}: ${stderr}`);
reject(error);
return;
}
const metadata = JSON.parse(stdout);
const format = metadata.format_id;
const resolution = metadata.height;
// Construct the output filename
const quality = resolution ? `${resolution}p` : 'best';
const outputFilename = path.join(__dirname, `video_${index + 1}_${quality}.mp4`);
// Try to download 1080p
const command = `youtube-dl -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" -o "${outputFilename}" "${url}"`;
exec(command, (error, stdout, stderr) => {
if (error) {
console.error(`Error downloading 1080p ${url}: ${stderr}`);
// Fallback to the best available quality
const fallbackCommand = `youtube-dl -f best -o "${outputFilename}" "${url}"`;
exec(fallbackCommand, (fallbackError, fallbackStdout, fallbackStderr) => {
if (fallbackError) {
console.error(`Error downloading fallback ${url}: ${fallbackStderr}`);
reject(fallbackError);
return;
}
console.log(`Successfully downloaded ${index + 1} of ${totalLinks} (fallback): ${url}`);
resolve();
});
} else {
console.log(`Successfully downloaded ${index + 1} of ${totalLinks} (1080p): ${url}`);
resolve();
}
});
});
});
};
// Sequentially download each .m3u8 link
for (let i = 0; i < totalLinks; i++) {
try {
await downloadLink(m3u8Links[i], i);
} catch (error) {
console.error(`Failed to download ${m3u8Links[i]}`);
}
}
})();