0%

eosvoter | 爬取矿池数据

先说一下我的两个思路。

  • 使用传统爬虫
  • 模仿浏览器

使用传统爬虫


经过对 btc.com 的分析,有一部分数据是 ajax 产生的。

如何辨别数据是哪里产生

红框中的数据是我想获取的

然后,使用 chrome 的开发者模式,查找到主页的排版是这样的:

看图,说明其并不是一个静态文件返回的,而是,动态加载的。

这时候考虑是否有端口可以获取,查找了相关的文件,发现并没有,所以,猜测是 ajax 得到的。

但是,现在这部分数据,还不着急,先分析主体数据。

当我,爬着爬着,突然就爬不上去了。

返回的页面数据如下:

1
2
3
4
5
6
7
8
9
10
<html>
<script>
var arg1='B96D7399184BAF63C2D93CE3A887B4B5E7EBD788';
var _0x4818=['\x63\x73\x4b\x48\x77\x71\x4d\x49','\x5a\x73\x4b\x4a\x77\x72\x38\x56\x65\x41\x73\x79','\x55\x63\x4b\x69\x4e\x38\x4f\x2f\x77\x70\x6c\x77\x4d\x41\x3d\x3d','\x4a\x52\x38\x43\x54\x67\x3d\x3d','\x59\x73\x4f\x6e\x62\x53\x45\x51\x77\x37\x6f\x7a\x77\x71\x5a\x4b\x65\x73\x4b\x55\x77\x37\x6b\x77\x58\x38\x4f\x52\x49\x51\x3d\x3d','\x77\x37\x6f\x56\x53\x38\x4f\x53\x77\x6f\x50\x43\x6c\x33\x6a\x43\x68\x4d\x4b\x68\x77\x36\x48\x44\x6c\x73\x4b\x58\x77\x34\x73\x2f\x59\x73\x4f\x47','\x66\x77\x56\x6d\x49\x31\x41\x74\x77\x70\x6c\x61\x59\x38\x4f\x74\x77\x35\x63\x4e\x66\x53\x67\x70\x77\x36\x4d\x3d','\x4f\x63\x4f\x4e\x77\x72\x6a\x43\x71\x73\x4b\x78\x54\x47\x54\x43\x68\x73\x4f\x6a\x45\x57\x45\x38\x50\x63\x4f\x63\x4a\x38\x4b\x36','\x55\x38\x4b\x35\x4c\x63\x4f\x74\x77\x70\x56\x30\x45\x4d\x4f\x6b\x77\x34\x37\x44\x72\x4d\x4f\x58','\x48\x4d\x4f\x32\x77\x6f\x48\x43\x69\x4d\x4b\x39\x53\x6c\x58\x43\x6c\x63\x4f\x6f\x43\x31\x6b\x3d','\x61\x73\x4b\x49\x77\x71\x4d\x44\x64\x67\x4d\x75\x50\x73\x4f\x4b\x42\x4d\x4b\x63\x77\x72\x72\x43\x74\x6b\x4c\x44\x72\x4d\x4b\x42\x77\x36\x34\x64','\x77\x71\x49\x6d\x4d\x54\x30\x74\x77\x36\x52\x4e\x77\x35\x6b\x3d','\x44\x4d\x4b\x63\x55\x30\x4a\x6d\x55\x77\x55\x76','\x56\x6a\x48\x44\x6c\x4d\x4f\x48\x56\x63\x4f\x4e\x58\x33\x66\x44\x69\x63\x4b\x4a\x48\x51\x3d\x3d','\x77\x71\x68\x42\x48\x38\x4b\x6e\x77\x34\x54\x44\x68\x53\x44\x44\x67\x4d\x4f\x64\x77\x72\x6a\x43\x6e\x63\x4f\x57\x77\x70\x68\x68\x4e\x38\x4b\x43\x47\x63\x4b\x71\x77\x36\x64\x48\x41\x55\x35\x2b\x77\x72\x67\x32\x4a\x63\x4b\x61\x77\x34\x49\x45\x4a\x63\x4f\x63\x77\x72\x52\x4a\x77\x6f\x5a\x30\x77\x71\x46\x39\x59\x67\x41\x56','\x64\x7a\x64\x32\x77\x35\x62\x44\x6d\x33\x6a\x44\x70\x73\x4b\x33\x77\x70\x59\x3d','\x77\x34\x50\x44\x67\x63\x4b\x58\x77\x6f\x33\x43\x6b\x63\x4b\x4c\x77\x72\x35\x71\x77\x72\x59\x3d','\x77\x72\x4a\x4f\x54\x63\x4f\x51\x57\x4d\x4f\x67','\x77\x71\x54\x44\x76\x63\x4f\x6a\x77\x34\x34\x37\x77\x72\x34\x3d','\x77\x35\x58\x44\x71\x73\x4b\x68\x4d\x46\x31\x2f','\x77\x72\x41\x79\x48\x73\x4f\x66\x77\x70\x70\x63','\x4a\x33\x64\x56\x50\x63\x4f\x78\x4c\x67\x3d\x3d','\x77\x72\x64\x48\x77\x37\x70\x39\x5a\x77\x3d\x3d','\x77\x34\x72\x44\x6f\x38\x4b\x6d\x4e\x45\x77\x3d','\x49\x4d\x4b\x41\x55\x6b\x42\x74','\x77\x36\x62\x44\x72\x63\x4b\x51\x77\x70\x56\x48\x77\x70\x4e\x51\x77\x71\x55\x3d','\x64\x38\x4f\x73\x57\x68\x41\x55\x77\x37\x59\x7a\x77\x72\x55\x3d','\x77\x71\x6e\x43\x6b\x73\x4f\x65\x65\x7a\x72\x44\x68\x77\x3d\x3d','\x55\x73\x4b\x6e\x49\x4d\x4b\x57\x56\x38\x4b\x2f','\x77\x34\x7a\x44\x6f\x63\x4b\x38\x4e\x55\x5a\x76','\x63\x38\x4f\x78\x5a\x68\x41\x4a\x77\x36\x73\x6b\x77\x71\x4a\x6a','\x50\x63\x4b\x49\x77\x34\x6e\x43\x6b\x6b\x56\x62','\x4b\x48\x67\x6f\x64\x4d\x4f\x32\x56\x51\x3d\x3d','\x77\x70\x73\x6d\x77\x71\x76\x44\x6e\x47\x46\x71','\x77\x71\x4c\x44\x74\x38\x4f\x6b\x77\x34\x63\x3d','\x77\x37\x77\x31\x77\x34\x50\x43\x70\x73\x4f\x34\x77\x71\x41\x3d','\x77\x71\x39\x46\x52\x73\x4f\x71\x57\x4d\x4f\x71','\x62\x79\x42\x68\x77\x37\x72\x44\x6d\x33\x34\x3d','\x4c\x48\x67\x2b\x53\x38\x4f\x74\x54\x77\x3d\x3d','\x77\x71\x68\x4f\x77\x37\x31\x35\x64\x73\x4f\x48','\x55\x38\x4f\x37\x56\x73\x4f\x30\x77\x71\x76\x44\x76\x63\x4b\x75\x4b\x73\x4f\x71\x58\x38\x4b\x72','\x59\x69\x74\x74\x77\x35\x44\x44\x6e\x57\x6e\x44\x72\x41\x3d\x3d','\x59\x4d\x4b\x49\x77\x71\x55\x55\x66\x67\x49\x6b','\x61\x42\x37\x44\x6c\x4d\x4f\x44\x54\x51\x3d\x3d','\x77\x70\x66\x44\x68\x38\x4f\x72\x77\x36\x6b\x6b','\x77\x37\x76\x43\x71\x4d\x4f\x72\x59\x38\x4b\x41\x56\x6b\x35\x4f\x77\x70\x6e\x43\x75\x38\x4f\x61\x58\x73\x4b\x5a\x50\x33\x44\x43\x6c\x63\x4b\x79\x77\x36\x48\x44\x72\x51\x3d\x3d','\x77\x6f\x77\x2b\x77\x36\x76\x44\x6d\x48\x70\x73\x77\x37\x52\x74\x77\x6f\x39\x38\x4c\x43\x37\x43\x69\x47\x37\x43\x6b\x73\x4f\x52\x54\x38\x4b\x6c\x57\x38\x4f\x35\x77\x72\x33\x44\x69\x38\x4f\x54\x48\x73\x4f\x44\x65\x48\x6a\x44\x6d\x63\x4b\x6c\x4a\x73\x4b\x71\x56\x41\x3d\x3d','\x4e\x77\x56\x2b','\x77\x37\x48\x44\x72\x63\x4b\x74\x77\x70\x4a\x61\x77\x70\x5a\x62','\x77\x70\x51\x73\x77\x71\x76\x44\x69\x48\x70\x75\x77\x36\x49\x3d','\x59\x4d\x4b\x55\x77\x71\x4d\x4a\x5a\x51\x3d\x3d','\x4b\x48\x31\x56\x4b\x63\x4f\x71\x4b\x73\x4b\x31','\x66\x51\x35\x73\x46\x55\x6b\x6b\x77\x70\x49\x3d','\x77\x72\x76\x43\x72\x63\x4f\x42\x52\x38\x4b\x6b','\x4d\x33\x77\x30\x66\x51\x3d\x3d','\x77\x36\x78\x58\x77\x71\x50\x44\x76\x4d\x4f\x46\x77\x6f\x35\x64'];(function(_0x4c97f0,_0x1742fd){var _0x4db1c=function(_0x48181e){while(--_0x48181e){_0x4c97f0['\x70\x75\x73\x68'](_0x4c97f0['\x73\x68\x69\x66\x74']());}};var _0x3cd6c6=function(){var _0xb8360b={'\x64\x61\x74\x61':{'\x6b\x65\x79':'\x63\x6f\x6f\x6b\x69\x65','\x76\x61\x6c\x75\x65':'\x74\x69\x6d\x65\x6f\x75\x74'},'\x73\x65\x74\x43\x6f\x6f\x6b\x69\x65':function(_0x20bf34,_0x3e840e,_0x5693d3,_0x5e8b26){_0x5e8b26=_0x5e8b26||{};var _0xba82f0=_0x3e840e+'\x3d'+_0x5693d3;var _0x5afe31=0x0;for(var _0x5afe31=0x0,_0x178627=_0x20bf34['\x6c\x65\x6e\x67\x74\x68'];_0x5afe31<_0x178627;_0x5afe31++){var _0x41b2ff=_0x20bf34[_0x5afe31];_0xba82f0+='\x3b\x20'+_0x41b2ff;var _0xd79219=_0x20bf34[_0x41b2ff];_0x20bf34['\x70\x75\x73\x68'](_0xd79219);_0x178627=_0x20bf34['\x6c\x65\x6e\x67\x74\x68'];if(_0xd79219!==!![]){_0xba82f0+='\x3d'+_0xd79219;}}_0x5e8b26['\x63\x6f\x6f\x6b\x69\x65']=_0xba82f0;},'\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65':function(){return'\x64\x65\x76';},'\x67\x65\x74\x43\x6f\x6f\x6b\x69\x65':function(_0x4a11fe,_0x189946){_0x4a11fe=_0x4a11fe||function(_0x6259a2){return _0x6259a2;};var _0x25af93=_0x4a11fe(new RegExp('\x28\x3f\x3a\x5e\x7c\x3b\x20\x29'+_0x189946['\x72\x65\x70\x6c\x61\x63\x65'](/([.$?*|{}()[]\/+^])/g,'\x24\x31')+'\x3d\x28\x5b\x5e\x3b\x5d\x2a\x29'));var _0x52d57c=function(_0x105f59,_0x3fd789){_0x105f59(++_0x3fd789);};_0x52d57c(_0x4db1c,_0x1742fd);return _0x25af93?decodeURIComponent(_0x25af93[0x1]):undefined;}};var _0x4a2aed=function(){var _0x124d17=new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return _0x124d17['\x74\x65\x73\x74'](_0xb8360b['\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65']['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};_0xb8360b['\x75\x70\x64\x61\x74\x65\x43\x6f\x6f\x6b\x69\x65']=_0x4a2aed;var _0x2d67ec='';var _0x120551=_0xb8360b['\x75\x70\x64\x61\x74\x65\x43\x6f\x6f\x6b\x69\x65']();if(!_0x120551){_0xb8360b['\x73\x65\x74\x43\x6f\x6f\x6b\x69\x65'](['\x2a'],'\x63\x6f\x75\x6e\x74\x65\x72',0x1);}else if(_0x120551){_0x2d67ec=_0xb8360b['\x67\x65\x74\x43\x6f\x6f\x6b\x69\x65'](null,'\x63\x6f\x75\x6e\x74\x65\x72');}else{_0xb8360b['\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65']();}};_0x3cd6c6();}(_0x4818,0x15b));var _0x55f3=function(_0x4c97f0,_0x1742fd){var _0x4c97f0=parseInt(_0x4c97f0,0x10);var _0x48181e=_0x4818[_0x4c97f0];if(!_0x55f3['\x61\x74\x6f\x62\x50\x6f\x6c\x79\x66\x69\x6c\x6c\x41\x70\x70\x65\x6e\x64\x65\x64']){(function(){var _0xdf49c6=Function('\x72\x65\x74\x75\x72\x6e\x20\x28\x66\x75\x6e\x63\x74\x69\x6f\x6e\x20\x28\x29\x20'+'\x7b\x7d\x2e\x63\x6f\x6e\x73\x74\x72\x75\x63\x74\x6f\x72\x28\x22\x72\x65\x74\x75\x72\x6e\x20\x74\x68\x69\x73\x22\x29\x28\x29'+'\x29\x3b');var _0xb8360b=_0xdf49c6();var _0x389f44='\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x2b\x2f\x3d';_0xb8360b['\x61\x74\x6f\x62']||(_0xb8360b['\x61\x74\x6f\x62']=function(_0xba82f0){var _0xec6bb4=String(_0xba82f0)['\x72\x65\x70\x6c\x61\x63\x65'](/=+$/,'');for(var _0x1a0f04=0x0,_0x18c94e,_0x41b2ff,_0xd79219=0x0,_0x5792f7='';_0x41b2ff=_0xec6bb4['\x63\x68\x61\x72\x41\x74'](_0xd79219++);~_0x41b2ff&&(_0x18c94e=_0x1a0f04%0x4?_0x18c94e*0x40+_0x41b2ff:_0x41b2ff,_0x1a0f04++%0x4)?_0x5792f7+=String['\x66\x72\x6f\x6d\x43\x68\x61\x72\x43\x6f\x64\x65'](0xff&_0x18c94e>>(-0x2*_0x1a0f04&0x6)):0x0){_0x41b2ff=_0x389f44['\x69\x6e\x64\x65\x78\x4f\x66'](_0x41b2ff);}return _0x5792f7;});}());_0x55f3['\x61\x74\x6f\x62\x50\x6f\x6c\x79\x66\x69\x6c\x6c\x41\x70\x70\x65\x6e\x64\x65\x64']=!![];}if(!_0x55f3['\x72\x63\x34']){var _0x232678=function(_0x401af1,_0x532ac0){var _0x45079a=[],_0x52d57c=0x0,_0x105f59,_0x3fd789='',_0x4a2aed='';_0x401af1=atob(_0x401af1);for(var _0x124d17=0x0,_0x1b9115=_0x401af1['\x6c\x65\x6e\x67\x74\x68'];_0x124d17<_0x1b9115;_0x124d17++){_0x4a2aed+='\x25'+('\x30\x30'+_0x401af1['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x124d17)['\x74\x6f\x53\x74\x72\x69\x6e\x67'](0x10))['\x73\x6c\x69\x63\x65'](-0x2);}_0x401af1=decodeURIComponent(_0x4a2aed);for(var _0x2d67ec=0x0;_0x2d67ec<0x100;_0x2d67ec++){_0x45079a[_0x2d67ec]=_0x2d67ec;}for(_0x2d67ec=0x0;_0x2d67ec<0x100;_0x2d67ec++){_0x52d57c=(_0x52d57c+_0x45079a[_0x2d67ec]+_0x532ac0['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x2d67ec%_0x532ac0['\x6c\x65\x6e\x67\x74\x68']))%0x100;_0x105f59=_0x45079a[_0x2d67ec];_0x45079a[_0x2d67ec]=_0x45079a[_0x52d57c];_0x45079a[_0x52d57c]=_0x105f59;}_0x2d67ec=0x0;_0x52d57c=0x0;for(var _0x4e5ce2=0x0;_0x4e5ce2<_0x401af1['\x6c\x65\x6e\x67\x74\x68'];_0x4e5ce2++){_0x2d67ec=(_0x2d67ec+0x1)%0x100;_0x52d57c=(_0x52d57c+_0x45079a[_0x2d67ec])%0x100;_0x105f59=_0x45079a[_0x2d67ec];_0x45079a[_0x2d67ec]=_0x45079a[_0x52d57c];_0x45079a[_0x52d57c]=_0x105f59;_0x3fd789+=String['\x66\x72\x6f\x6d\x43\x68\x61\x72\x43\x6f\x64\x65'](_0x401af1['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x4e5ce2)^_0x45079a[(_0x45079a[_0x2d67ec]+_0x45079a[_0x52d57c])%0x100]);}return _0x3fd789;};_0x55f3['\x72\x63\x34']=_0x232678;}if(!_0x55f3['\x64\x61\x74\x61']){_0x55f3['\x64\x61\x74\x61']={};}if(_0x55f3['\x64\x61\x74\x61'][_0x4c97f0]===undefined){if(!_0x55f3['\x6f\x6e\x63\x65']){var _0x5f325c=function(_0x23a392){this['\x72\x63\x34\x42\x79\x74\x65\x73']=_0x23a392;this['\x73\x74\x61\x74\x65\x73']=[0x1,0x0,0x0];this['\x6e\x65\x77\x53\x74\x61\x74\x65']=function(){return'\x6e\x65\x77\x53\x74\x61\x74\x65';};this['\x66\x69\x72\x73\x74\x53\x74\x61\x74\x65']='\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a';this['\x73\x65\x63\x6f\x6e\x64\x53\x74\x61\x74\x65']='\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d';};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x63\x68\x65\x63\x6b\x53\x74\x61\x74\x65']=function(){var _0x19f809=new RegExp(this['\x66\x69\x72\x73\x74\x53\x74\x61\x74\x65']+this['\x73\x65\x63\x6f\x6e\x64\x53\x74\x61\x74\x65']);return this['\x72\x75\x6e\x53\x74\x61\x74\x65'](_0x19f809['\x74\x65\x73\x74'](this['\x6e\x65\x77\x53\x74\x61\x74\x65']['\x74\x6f\x53\x74\x72\x69\x6e\x67']())?--this['\x73\x74\x61\x74\x65\x73'][0x1]:--this['\x73\x74\x61\x74\x65\x73'][0x0]);};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x72\x75\x6e\x53\x74\x61\x74\x65']=function(_0x4380bd){if(!Boolean(~_0x4380bd)){return _0x4380bd;}return this['\x67\x65\x74\x53\x74\x61\x74\x65'](this['\x72\x63\x34\x42\x79\x74\x65\x73']);};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x67\x65\x74\x53\x74\x61\x74\x65']=function(_0x58d85e){for(var _0x1c9f5b=0x0,_0x1ce9e0=this['\x73\x74\x61\x74\x65\x73']['\x6c\x65\x6e\x67\x74\x68'];_0x1c9f5b<_0x1ce9e0;_0x1c9f5b++){this['\x73\x74\x61\x74\x65\x73']['\x70\x75\x73\x68'](Math['\x72\x6f\x75\x6e\x64'](Math['\x72\x61\x6e\x64\x6f\x6d']()));_0x1ce9e0=this['\x73\x74\x61\x74\x65\x73']['\x6c\x65\x6e\x67\x74\x68'];}return _0x58d85e(this['\x73\x74\x61\x74\x65\x73'][0x0]);};new _0x5f325c(_0x55f3)['\x63\x68\x65\x63\x6b\x53\x74\x61\x74\x65']();_0x55f3['\x6f\x6e\x63\x65']=!![];}_0x48181e=_0x55f3['\x72\x63\x34'](_0x48181e,_0x1742fd);_0x55f3['\x64\x61\x74\x61'][_0x4c97f0]=_0x48181e;}else{_0x48181e=_0x55f3['\x64\x61\x74\x61'][_0x4c97f0];}return _0x48181e;};var arg3=null;var arg4=null;var arg5=null;var arg6=null;var arg7=null;var arg8=null;var arg9=null;var arg10=null;var l=function(){while(window[_0x55f3('0x1', '\x58\x4d\x57\x5e')]||window['\x5f\x5f\x70\x68\x61\x6e\x74\x6f\x6d\x61\x73']){};var _0x5e8b26=_0x55f3('0x3', '\x6a\x53\x31\x59');String[_0x55f3('0x5', '\x6e\x5d\x66\x52')][_0x55f3('0x6', '\x50\x67\x35\x34')]=function(_0x4e08d8){var _0x5a5d3b='';for(var _0xe89588=0x0;_0xe89588<this[_0x55f3('0x8', '\x29\x68\x52\x63')]&&_0xe89588<_0x4e08d8[_0x55f3('0xa', '\x6a\x45\x26\x5e')];_0xe89588+=0x2){var _0x401af1=parseInt(this[_0x55f3('0xb', '\x56\x32\x4b\x45')](_0xe89588,_0xe89588+0x2),0x10);var _0x105f59=parseInt(_0x4e08d8[_0x55f3('0xd', '\x58\x4d\x57\x5e')](_0xe89588,_0xe89588+0x2),0x10);var _0x189e2c=(_0x401af1^_0x105f59)[_0x55f3('0xf', '\x57\x31\x46\x45')](0x10);if(_0x189e2c[_0x55f3('0x11', '\x4d\x47\x72\x76')]==0x1){_0x189e2c='\x30'+_0x189e2c;}_0x5a5d3b+=_0x189e2c;}return _0x5a5d3b;};String['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65'][_0x55f3('0x14', '\x5a\x2a\x44\x4d')]=function(){var _0x4b082b=[0xf,0x23,0x1d,0x18,0x21,0x10,0x1,0x26,0xa,0x9,0x13,0x1f,0x28,0x1b,0x16,0x17,0x19,0xd,0x6,0xb,0x27,0x12,0x14,0x8,0xe,0x15,0x20,0x1a,0x2,0x1e,0x7,0x4,0x11,0x5,0x3,0x1c,0x22,0x25,0xc,0x24];var _0x4da0dc=[];var _0x12605e='';for(var _0x20a7bf=0x0;_0x20a7bf<this['\x6c\x65\x6e\x67\x74\x68'];_0x20a7bf++){var _0x385ee3=this[_0x20a7bf];for(var _0x217721=0x0;_0x217721<_0x4b082b[_0x55f3('0x16', '\x61\x48\x2a\x4e')];_0x217721++){if(_0x4b082b[_0x217721]==_0x20a7bf+0x1){_0x4da0dc[_0x217721]=_0x385ee3;}}}_0x12605e=_0x4da0dc['\x6a\x6f\x69\x6e']('');return _0x12605e;};var _0x23a392=arg1[_0x55f3('0x19', '\x50\x67\x35\x34')]();arg2=_0x23a392[_0x55f3('0x1b', '\x7a\x35\x4f\x26')](_0x5e8b26);setTimeout('\x72\x65\x6c\x6f\x61\x64\x28\x61\x72\x67\x32\x29',0x2);};var _0x4db1c=function(){function _0x355d23(_0x450614){if((''+_0x450614/_0x450614)[_0x55f3('0x1c', '\x56\x32\x4b\x45')]!==0x1||_0x450614%0x14===0x0){(function(){}[_0x55f3('0x1d', '\x43\x4e\x55\x59')]((undefined+'')[0x2]+(!![]+'')[0x3]+([][_0x55f3('0x1e', '\x77\x38\x50\x52')]()+'')[0x2]+(undefined+'')[0x0]+(![]+[0x0]+String)[0x14]+(![]+[0x0]+String)[0x14]+(!![]+'')[0x3]+(!![]+'')[0x1])());}else{(function(){}['\x63\x6f\x6e\x73\x74\x72\x75\x63\x74\x6f\x72']((undefined+'')[0x2]+(!![]+'')[0x3]+([][_0x55f3('0x1f', '\x4c\x24\x28\x44')]()+'')[0x2]+(undefined+'')[0x0]+(![]+[0x0]+String)[0x14]+(![]+[0x0]+String)[0x14]+(!![]+'')[0x3]+(!![]+'')[0x1])());}_0x355d23(++_0x450614);}try{_0x355d23(0x0);}catch(_0x54c483){}};if(function(){var _0x470d8f=function(){var _0x4c97f0=!![];return function(_0x1742fd,_0x4db1c){var _0x48181e=_0x4c97f0?function(){if(_0x4db1c){var _0x55f3be=_0x4db1c['\x61\x70\x70\x6c\x79'](_0x1742fd,arguments);_0x4db1c=null;return _0x55f3be;}}:function(){};_0x4c97f0=![];return _0x48181e;};}();var _0x501fd7=_0x470d8f(this,function(){var _0x4c97f0=function(){return'\x64\x65\x76';},_0x1742fd=function(){return'\x77\x69\x6e\x64\x6f\x77';};var _0x55f3be=function(){var _0x3ad9a1=new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return!_0x3ad9a1['\x74\x65\x73\x74'](_0x4c97f0['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x1b93ad=function(){var _0x20bf34=new RegExp('\x28\x5c\x5c\x5b\x78\x7c\x75\x5d\x28\x5c\x77\x29\x7b\x32\x2c\x34\x7d\x29\x2b');return _0x20bf34['\x74\x65\x73\x74'](_0x1742fd['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x5afe31=function(_0x178627){var _0x1a0f04=~-0x1>>0x1+0xff%0x0;if(_0x178627['\x69\x6e\x64\x65\x78\x4f\x66']('\x69'===_0x1a0f04)){_0xd79219(_0x178627);}};var _0xd79219=function(_0x5792f7){var _0x4e08d8=~-0x4>>0x1+0xff%0x0;if(_0x5792f7['\x69\x6e\x64\x65\x78\x4f\x66']((!![]+'')[0x3])!==_0x4e08d8){_0x5afe31(_0x5792f7);}};if(!_0x55f3be()){if(!_0x1b93ad()){_0x5afe31('\x69\x6e\x64\u0435\x78\x4f\x66');}else{_0x5afe31('\x69\x6e\x64\x65\x78\x4f\x66');}}else{_0x5afe31('\x69\x6e\x64\u0435\x78\x4f\x66');}});_0x501fd7();var _0x3a394d=function(){var _0x1ab151=!![];return function(_0x372617,_0x42d229){var _0x3b3503=_0x1ab151?function(){if(_0x42d229){var _0x7086d9=_0x42d229[_0x55f3('0x21', '\x4b\x4e\x29\x46')](_0x372617,arguments);_0x42d229=null;return _0x7086d9;}}:function(){};_0x1ab151=![];return _0x3b3503;};}();var _0x5b6351=_0x3a394d(this,function(){var _0x46cbaa=Function(_0x55f3('0x22', '\x26\x68\x5a\x59')+_0x55f3('0x23', '\x61\x48\x2a\x4e')+'\x29\x3b');var _0x1766ff=function(){};var _0x9b5e29=_0x46cbaa();_0x9b5e29[_0x55f3('0x26', '\x61\x48\x2a\x4e')]['\x6c\x6f\x67']=_0x1766ff;_0x9b5e29[_0x55f3('0x29', '\x56\x25\x59\x52')][_0x55f3('0x2a', '\x50\x5e\x45\x71')]=_0x1766ff;_0x9b5e29[_0x55f3('0x2c', '\x6c\x67\x4d\x30')][_0x55f3('0x2d', '\x4c\x24\x28\x44')]=_0x1766ff;_0x9b5e29[_0x55f3('0x2f', '\x43\x5a\x63\x38')][_0x55f3('0x30', '\x57\x75\x36\x25')]=_0x1766ff;});_0x5b6351();try{return!!window['\x61\x64\x64\x45\x76\x65\x6e\x74\x4c\x69\x73\x74\x65\x6e\x65\x72'];}catch(_0x35538d){return![];}}()){document[_0x55f3('0x33', '\x56\x25\x59\x52')](_0x55f3('0x34', '\x79\x41\x70\x7a'),l,![]);}else{document[_0x55f3('0x36', '\x79\x41\x70\x7a')](_0x55f3('0x37', '\x4c\x24\x28\x44'),l);}_0x4db1c();setInterval(function(){_0x4db1c();},0xfa0);

function setCookie(name,value){var expiredate=new Date();expiredate.setTime(expiredate.getTime()+(3600*1000));document.cookie=name+"="+value+";expires="+expiredate.toGMTString()+";max-age=3600;path=/";}
function reload(x) {setCookie("acw_sc__v2", x);document.location.reload();}
</script>

</html>

经过大规模的测试,发现这部分代码是这样的

  • 有一个加密 cookie,来防止反爬
  • cookie 有一个 X 值是我们获取不到的,所以,不能伪造 cookie
  • 不用 cookie 也可以使用爬虫,但是,只能爬取几页就被封了,封的时常差不多是一个小时左右
  • 被封的电脑即便是换了 IP 也不能继续爬,初步猜测是和机器码有关,但是,没有验证

所以,使用传统方法爬这个路子暂时行不通。

其 cookie 的样子是这样的

ps: 我尝试过用 python 执行 js 代码,但是,最后的密钥我还是破解不了,失败… …


模拟浏览器


所以,经过查询资料,我打算使用模拟浏览器的方法。

因为我的电脑是 MacBook,所以,我的方案最终如下

MacBook + python3 + chrome + chrome驱动 + selenium

其中 selenium 是 python 的一个模块

安装 selenium

可以使用

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple selenium

安装 chrome 驱动

你应该事先先安装好 chrome 浏览器。

然后在其设置中,找到你是什么版本的 chrome。

比如,我的是

版本 81.0.4044.138(正式版本) (64 位)

也就是 81。

在上面的官网上找到属于自己版本的驱动,然后根据自己的系统下载下来。

下载下来后,我们需要把这个内核驱动放在 /usr/bin 中。

关于如何放置,你可以参考我下面的文章。

macbook | 更改系统级目录的权限

配置完成后,就可以在命令行下执行chromedriver命令了。

试试在终端输入指令:

chromedriver

出现

Starting ChromeDriver 81.0.4044.69 (6813546031a4bc83f717a2ef7cd4ac6ec1199132-refs/branch-heads/4044@{#776}) on port 9515
Only local connections are allowed.
Please protect ports used by ChromeDriver and related test frameworks to prevent access by malicious code.

这个只是测试,在实际写代码的时候,不需要开启这个。

ps: 2021-8-18

我的 mac 系统升级到 11 之后,就不能再把 chromedriver 移到 /usr/bin 了。

会报:

Operation not permitted

所以,我将其放在 /usr/local/bin 中。

但是,如果你直接使用 chromedriver 会出现

由于无法验证开发人员,因此无法打开“ chromedriver”。无法启动 Chrome`浏览器

这个时候,你可以进入到 chromedriver 的存放目录,执行

xattr -d com.apple.quarantine chromedriver 

就可以了。

在python程序中测试

执行以下代码,看看是否会打开google

1
2
3
4
from selenium import webdriver

browser = webdriver.Chrome()
browser.get('http://www.google.com/')

ps:不需要在终端事先输入 chromedriver

如果,一切无误的话,应该可以成功开启。

爬取网页

参考资料

有一件令人惊喜的是,由于模拟了浏览器,其实也不是模拟浏览器。

Selenium 本身是自动化运维工具,其本来就是在浏览器打开网页,然后模拟人的操作,当然,网页的源代码也会相应的传到代码中。

但是,这么做对我来说有两个好处

  • 浏览器可以自动加载 ajax 数据,然后传给我
  • cookie 也是浏览器自动加的,我也不需要解密了

总之,网页上呈现的数据我都可以抓到。

下面的工作就是如何解析了。

这里,我用的是 Beautifulsoup

解析网页

需要注意的是,访问 btc.com 需要走科学上网路线。

其实解析就非常简单了,我下面举一个简单的例子,你们就会了。

1
2
3
4
5
6
7
8
9
10
11
12
from selenium import webdriver

browser = webdriver.Chrome()
response = browser.page_source
html = BeautifulSoup(response, "html.parser", from_encoding="utf-8")
...

解析网页

...

browser.close()

需要注意的是,这段程序会自动打开 chrome,然后进行操作。

ps: 这一博文几乎没怎么用 selenium ,只是借助了它的返回页面的操作。

这个库真正的用处在于模拟人的操作,包括验证码滑动等。

当然,由于部署在服务器端,不需要开启窗口环境,所以,需要自己设置一些限制参数,比如我的是

1
2
3
4
5
6
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('blink-settings=imagesEnabled=false')
browser = webdriver.Chrome(chrome_options=options)

更新 2020-6-8


将项目放在新的服务器上,出现了新的问题。

selenium 的得到的网页数据拿不到 ajax 的数据了。

最后的解决办法,我进行了人的模拟滑动操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('blink-settings=imagesEnabled=false')
browser = webdriver.Chrome(chrome_options=options)

net_info = dict()
browser.get(root_url)
time.sleep(2)
browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(2)
response = browser.page_source
pool_tables = BeautifulSoup(response, "html.parser", from_encoding="utf-8").find_all('div', class_="panel-body")

...
...
请我喝杯咖啡吧~