当前位置:首页 > 短视频综合 > 正文

bilibili排行榜爬取

2020-11-20 17:29:49 暂无评论 短视频综合

        众所周知,B站学习软件。哈哈哈哈,今天我们就爬取B站的排行榜。废话不多说了,直接开始了。

分析:

        我们看图一可以发现每个是视频的info都在li的标签里,我可以用xpath得到,在这里我想获得视频的封面,播放量,综合得分以及视频链接;除了封面,其它的都可以得到,后来我在另一个另一个链接中发现了,我在后面会讲到。

       

                                              图一:

我们点开视频链接,进入视频播放页,F12一下,点击network,让视频播放,会发现有许多xhr文件不断刷新(如图二文件),它以m4s结尾

                                                图二:

我们可推断视频是每段小段m4s的文件结合起来。我复制其中一个链接,打开后,如图三

                                            图三:

这时我们该想另一件事,即使我们能获得这个文件,我们该怎么获取这样一个个链接,我找了好大一会,找不到,那我们就应该换一种思路,是不是有一个完整的视频链接,它会保存到什么地方,最后被我找到了,它其实隐藏在一开始的elements中,这是我们在里面搜索一下window,会发现图四:

                                                图四:

这时我们可以打开页面源码,把进行查看,我第一眼感觉他是json文件,这里我们可以用正则获取,我们分析一下:

dic={"code":0,"message":"0","ttl":1,"data":{"from":"local","result":"suee","message":"","quality":80,"format":"flv","timelength":146787,"accept_format":"hdflv2,flv,flv720,flv480,mp4","accept_description":["高清 1080P+","高清 1080P","高清 720P","清晰 480P","流畅 360P"],"accept_quality":[112,80,64,32,16],"video_codecid":7,"seek_param":"start","seek_type":"offset","dash":{"duration":147,"minBufferTime":1.5,"min_buffer_time":1.5,"video":[{"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":1288827,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640032","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1005","indexRange":"1006-1385"},"segment_base":{"initialization":"0-1005","index_range":"1006-1385"},"codecid":7},{"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":777178,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1178","indexRange":"1179-1558"},"segment_base":{"initialization":"0-1178","index_range":"1179-1558"},"codecid":12},{"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":937924,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640028","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1003","indexRange":"1004-1383"},"segment_base":{"initialization":"0-1003","index_range":"1004-1383"},"codecid":7},{"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":567464,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":557917,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001F","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1007","indexRange":"1008-1387"},"segment_base":{"initialization":"0-1007","index_range":"1008-1387"},"codecid":7},{"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":339786,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1182","indexRange":"1183-1562"},"segment_base":{"initialization":"0-1182","index_range":"1183-1562"},"codecid":12},{"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":217071,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":353246,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001E","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1028","indexRange":"1029-1408"},"segment_base":{"initialization":"0-1028","index_range":"1029-1408"},"codecid":7}],"audio":[{"id":30280,"baseUrl":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{"initialization":"0-907","index_range":"908-1299"},"codecid":0},{"id":30216,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":67328,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-932","indexRange":"933-1324"},"segment_base":{"initialization":"0-932","index_range":"933-1324"},"codecid":0},{"id":30232,"baseUrl":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{"initialization":"0-907","index_range":"908-1299"},"codecid":0}]},"support_formats":[{"quality":112,"format":"hdflv2","new_description":"1080P 高码率","display_desc":"1080P","superscript":"高码率"},{"quality":80,"format":"flv","new_description":"1080P 高清","display_desc":"1080P","superscript":""},{"quality":64,"format":"flv720","new_description":"720P 高清","display_desc":"720P","superscript":""},{"quality":32,"format":"flv480","new_description":"480P 清晰","display_desc":"480P","superscript":""},{"quality":16,"format":"mp4","new_description":"360P 流畅","display_desc":"360P","superscript":""}]},"session":"b80375f9a61937c9ce93ee13909c1bca"}for key,value in dic['data'].items():    print(key,':',value)print('===================================')for key,value in dic['data']['dash'].items():    print(key,':',value)print('===================================')for key,value in dic['data']['support_formats'][0].items():    print(key,':',value)

dic是我们得到json数据,经过我一成一成剥开,发现他的视频与音频是两个文件,那就是分开的,我们可以下载后合成。我们看下我分析的结果:

                                                        图五:

accept_description指的是视频画质,accept_quality指的是视频画质对应的id,这里我没有会员,所以最高获取高清 1080的画质视频,视频文件在video的baseUrl中,音频文件在audio的baseUrl。

同时我带着试试的想法吧图一红线的那一串字符复制,在视频链接的elements中搜寻,居然找到(如图七),我打开了链接就是原先封面,并且我在其它视频链接中试试,得到的都是视频封面,我们用正则就可以得到。

                                                图七:

我们的的分析完成率,接下来上代码。

代码:

1:引入库

import refrom random import randintimport requestsfrom lxml import etreefrom time import sleepimport jsonimport os

2:建立session,共享cookie

# 建立sessionprint('建立session')session = requests.Session()base_url = 'https://www.bilibili.com/'base_headers = {    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',    'cookie': 自己的cookie,    'referer': 'https://www.google.com/',}session.get(url=base_url, headers=base_headers)sleep(randint(3,5))

3:爬取视频排行榜:(在这里我感觉headers加上referer是非常重要的,referer也就是你上一级网页链接)

# 爬取排行榜视频:print('爬取排行榜视频')dic={}leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'leaderboard_headers = {    'referer': leaderboard_url,    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',    'accept-encoding': 'gzip, deflate, br',    'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',    'cache-control': 'max-age=0',}response = session.get(url=leaderboard_url, headers=leaderboard_headers)sleep(randint(3,5))content = response.contenthtml = etree.HTML(content)info_list = html.xpath('//ul[@class="rank-list"]/li')for li in info_list:    name = li.xpath('div[2]/div[2]/a/text()')[0]             #视频名字    href = 'https:'+li.xpath('div[2]/div[2]/a/@href')[0]     #视频链接    score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0]+'综合得分'               #综合得分    play_volume=li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip()        #播放量    list=[href,score,play_volume]    dic[name]=list    # print(name,href,score,play_volume)    # print(dic)

在这里我把视频的name作为字典的key,而视频链接,综合得分,播放量放在列表里,list作为字典的value。

4:在这里我爬取时有时候session没法用,我就勇try一下,如果session可以,就不要except,不可以,我就勇request.get求求,不要忘了加入cookie。

我在进行爬取时,把视频链接与音频链接放入一个列表,再把这个列表放入前面的列表中

#得到音频链接print('视频爬取')video_headers={'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','accept-encoding': 'gzip, deflate, br','accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7','cache-control': 'max-age=0','user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36','referer':leaderboard_url,}num=0for i in dic.keys():video_url=dic[i][0]#获取封面链接try:    response=session.get(url=video_url,headers=video_headers)except:    video_headers['cookie'] = 自己的cookie,    response=requests.get(url=video_url,headers=video_headers)text = response.textimg_url=re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">',text).group(1)dic[i].append(img_url)              #照片链接添加到列表里data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)data = json.loads(data)# print(data)
try: time = data['data']['dash']['duration'] minute = int(time) // 60 second = int(time) % 60    #视频链接 video_url = data['data']['dash']['video'][0]['baseUrl']    #音频链接 audio_url = data['data']['dash']['audio'][0]['baseUrl'] list=[video_url,audio_url] dic[i].append(list) print(video_url) print(audio_url) print('视频时长{}分{}秒'.format(minute, second))except KeyError: time = data['data']['timelength'] // 1000 minute = int(time) // 60 # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。 second = int(time) % 60 video_url = data['data']['durl'][0]['url'] list = [video_url] dic[i].append(list)    print('视频时长{}分{}秒'.format(minute, second))    

5:视频音频下载

在这里我们的请求头必须重新搞一下,如果不重新能,无法下载,这时我们应该借鉴前面图二那些xhr文件的header,我发现它们的headers中

'origin': 'https://www.bilibili.com',

'referer': 'https://www.bilibili.com/',

都有这两个,然后我添加进去成功了

#下载视频与音频print('下载')headers={    'cookie': "_uuid=8F07FD72-D1AB-9DE2-77FC-472A2BAEC82A93622infoc; buvid3=A24C27D3-AB9C-4B94-BE59-95400972464F143110infoc; CURRENT_FNVAL=80; blackside_state=1; rpdid=|(u))|uJm|||0J'uY|RuRRY)R; DedeUserID=481314897; DedeUserID__ckMd5=711c23e458095c16; SESSDATA=ac630eb7%2C1620741811%2Cbd21f*b1; bili_jct=37fcc0e4cab72fb33c731cb06c7a80fa; CURRENT_QUALITY=80; bsource=search_google; finger=158939783; PVID=2; sid=8gux72go",    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',    'origin': 'https://www.bilibili.com',    'referer': 'https://www.bilibili.com/',}
path=r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)bool=mkdir(path)if bool==1: video_path=path+'\_video.mp4' audio_path=path+'\_audio.mp4' save_path=path+'\{}.mp4'.format(num) info_path=path+'\{}.text'.format(num) img_path=path+'\{}.png'.format(num) num += 1 print('{}视频开始爬取'.format(i))
with open(video_path, 'wb') as f: # 视频部分 response = requests.get(dic[i][-1][0], headers=headers) print(response.status_code) f.write(response.content) print('{}视频爬取完成'.format(i))
print('{}音频开始爬取'.format(i)) with open(audio_path, 'wb') as f: # 音频部分 response = requests.get(dic[i][-1][-1], headers=headers) f.write(response.content) print('{}音频爬取完成'.format(i))

6:封面下载与info保存:

#封面下载with open(img_path, 'wb') as f:    headers = {        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}    url = 'http://i2.hdslb.com/bfs/archive/273ed274d5cf2556e162f8d1f7eef3b63bd2f31b.jpg'    response = requests.get(url=dic[i][3], headers=headers)    f.write(response.content)#info保存with open(info_path,'w') as f:    info=i+'\n'+dic[i][1]+'\n'+dic[i][2]    f.write(info)

7:视频合成

先要视频合成必须以管理员身份运行编辑器,我用的是pycharm,还有就是编辑器编码要变成'gbk',不能'utf-8'

cmd=r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path,audio_path,save_path)    p = os.popen(cmd)

全部代码:

import refrom random import randintimport requestsfrom lxml import etreefrom time import sleepimport jsonimport os

def get_link_and_img(): # 建立session print('建立session') session = requests.Session() base_url = 'https://www.bilibili.com/' base_headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'cookie': 自己的cookie, 'referer': 'https://www.google.com/', } session.get(url=base_url, headers=base_headers) sleep(randint(3, 5))
# 爬取排行榜视频: print('爬取排行榜视频') dic = {} leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3' leaderboard_headers = { 'referer': leaderboard_url, 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7', 'cache-control': 'max-age=0', } response = session.get(url=leaderboard_url, headers=leaderboard_headers) sleep(randint(3, 5)) content = response.content html = etree.HTML(content) info_list = html.xpath('//ul[@class="rank-list"]/li') for li in info_list: name = li.xpath('div[2]/div[2]/a/text()')[0] # 视频名字 href = 'https:' + li.xpath('div[2]/div[2]/a/@href')[0] # 视频链接 score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0] + '综合得分' # 综合得分 play_volume = li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip() # 播放量 list = [href, score, play_volume] dic[name] = list # print(name,href,score,play_volume) # print(dic)
# 视频爬取 print('视频爬取') video_headers = { 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7', 'cache-control': 'max-age=0', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'referer': leaderboard_url, } num = 0 for i in dic.keys(): video_url = dic[i][0] # 获取封面链接 try: response = session.get(url=video_url, headers=video_headers) except:                video_headers['cookie'] = 自己的cookie response = requests.get(url=video_url, headers=video_headers) text = response.text img_url = re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">', text).group(1) dic[i].append(img_url) # 照片链接添加到列表里 data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1) data = json.loads(data) # print(data)
try: time = data['data']['dash']['duration'] minute = int(time) // 60 second = int(time) % 60 video_url = data['data']['dash']['video'][0]['baseUrl'] audio_url = data['data']['dash']['audio'][0]['baseUrl'] list = [video_url, audio_url] dic[i].append(list) # print(video_url) # print(audio_url) # print('视频时长{}分{}秒'.format(minute, second)) except KeyError: time = data['data']['timelength'] // 1000 minute = int(time) // 60 # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。 second = int(time) % 60 video_url = data['data']['durl'][0]['url'] list = [video_url] dic[i].append(list) # print('视频时长{}分{}秒'.format(minute, second))
# 下载视频与音频 print('下载视频音频') headers = { 'cookie': 自己的cookie, 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'origin': 'https://www.bilibili.com', 'referer': 'https://www.bilibili.com/', }
path = r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num) bool = mkdir(path) # print(bool) # print(path)
if bool==1: video_path = path + '\_video.mp4' audio_path = path + '\_audio.mp4' save_path = path + '\{}.mp4'.format(num) info_path = path + '\{}.text'.format(num) img_path = path + '\{}.png'.format(num) print('{}视频开始爬取'.format(i))
with open(video_path, 'wb') as f: # 视频部分 response = requests.get(dic[i][-1][0], headers=headers) print(response.status_code) f.write(response.content) print('{}视频爬取完成'.format(i))
print('{}音频开始爬取'.format(i)) with open(audio_path, 'wb') as f: # 音频部分 response = requests.get(dic[i][-1][-1], headers=headers) f.write(response.content) print('{}音频爬取完成'.format(i))
# 封面下载 with open(img_path, 'wb') as f: headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', } response = requests.get(url=dic[i][3], headers=headers) f.write(response.content)
# info保存 with open(info_path, 'w') as f: info = i + '\n' + dic[i][1] + '\n' + dic[i][2] f.write(info)
# 音频视频合成 composite(video_path, audio_path, save_path) sleep(randint(5, 8))
else: print('{}已经被爬取'.format(i)) num = num + 1

def mkdir(path): folder = os.path.exists(path) if not folder: # 判断是否存在文件夹如果不存在则创建为文件夹 os.makedirs(path) return 1 else: return 0

def composite(video_path, audio_path, save_path): cmd = r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path, audio_path, save_path) p = os.popen(cmd) # print(p.read())

get_link_and_img()

这里面的下载视频与音频还有封面,以及合成视频音频可以再def一个函数,看起来比较好看,容易读。

这里我把字典的对应表示出来key:[href,sorce,play_volume,[video_url,audio_url]]。

另外可以见到我里面有sleep,为什么呢?因为我们是讲武德的。

我展示一下我的爬取到的(视频带logo):

爬取的原视频:

这是爬取到的音频(因为平台限制无法上传原视频格式mp4,我改成mp3上传)

封面:

info:

特朗普:不装了,摊牌了,我中文贼好!

3985390综合得分

435.8万

合成视频:

好了,这一期爬虫就到处为止,如果你有不懂得,可以下面评论。


如果有想了解互联网策略产品,可以搜索微信号 wangsanday,关注他,他在里面聊策略侃经济谈江湖思人生,满满的干货。

这是它的二维码:

注:

如果视频有侵权,立即删除


文章转载自微信公众号spiders

博客主人破茧短视频培训
破茧短视频为你分享抖音、快手等短视频平台的视频拍摄、剪辑和运营技巧,另有短视频培训学习教程,海量干货助你玩转短视频运营!。
  • 51952 文章总数
  • 4875933访问次数
  • 2205建站天数