[转载]php 整洁之道

clean-code-php

Table of Contents

  1. 介绍
  2. 变量
  3. 函数

介绍

本文由 yangweijie 翻译自clen php code,团建用,欢迎大家指正。

摘录自 Robert C. Martin的Clean Code 书中的软件工程师的原则 ,适用于PHP。 这不是风格指南。 这是一个关于开发可读、可复用并且可重构的PHP软件指南。

并不是这里所有的原则都得遵循,甚至很少的能被普遍接受。 这些虽然只是指导,但是都是Clean Code作者多年总结出来的。

Inspired from clean-code-javascript

变量

使用有意义且可拼写的变量名

Bad:

$ymdstr = $moment->format('y-m-d');

Good:

$currentDate = $moment->format('y-m-d');

⬆ 返回顶部

同种类型的变量使用相同词汇

Bad:

getUserInfo();
getClientData();
getCustomerRecord();

Good:

getUser();

⬆ 返回顶部

使用易检索的名称

我们会读比写要多的代码。通过是命名易搜索,让我们写出可读性和易搜索代码很重要。

Bad:

// What the heck is 86400 for?
addExpireAt(86400);

Good:

// Declare them as capitalized `const` globals.
interface DateGlobal {
  const SECONDS_IN_A_DAY = 86400;
}

addExpireAt(DateGlobal::SECONDS_IN_A_DAY);

⬆ 返回顶部

使用解释型变量

Bad:

$address = 'One Infinite Loop, Cupertino 95014';
$cityZipCodeRegex = '/^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$/';
preg_match($cityZipCodeRegex, $address, $matches);
saveCityZipCode($matches[1], $matches[2]);

Good:

$address = 'One Infinite Loop, Cupertino 95014';
$cityZipCodeRegex = '/^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$/';
preg_match($cityZipCodeRegex, $address, $matches);
list(, $city, $zipCode) = $matchers;
saveCityZipCode($city, $zipCode);

⬆ 返回顶部

避免心理映射

明确比隐性好。

Bad:

$l = ['Austin', 'New York', 'San Francisco'];
foreach($i=0; $i<count($l); $i++) {
  oStuff();
  doSomeOtherStuff();
  // ...
  // ...
  // ...
  // 等等`$l` 又代表什么?
  dispatch($l);
}

Good:

$locations = ['Austin', 'New York', 'San Francisco'];
foreach($i=0; $i<count($locations); $i++) {
  $location = $locations[$i];
  
  doStuff();
  doSomeOtherStuff();
  // ...
  // ...
  // ...
  dispatch($location);
});

⬆ 返回顶部

不要添加不必要上下文

如果你的class/object 名能告诉你什么,不要把它重复在你变量名里。

Bad:

$car = [
  'carMake'  => 'Honda',
  'carModel' => 'Accord',
  'carColor' => 'Blue',
];

function paintCar(&$car) {
  $car['carColor'] = 'Red';
}

Good:

$car = [
  'make'  => 'Honda',
  'model' => 'Accord',
  'color' => 'Blue',
];

function paintCar(&$car) {
  $car['color'] = 'Red';
}

⬆ 返回顶部

###使用参数默认值代替短路或条件语句。 Bad:

function createMicrobrewery($name = null) {
  $breweryName = $name ?: 'Hipster Brew Co.';
  // ...
}

Good:

function createMicrobrewery($breweryName = 'Hipster Brew Co.') {
  // ...
}

⬆ 返回顶部

函数

函数参数最好少于2个

限制函数参数个数极其重要因为它是你函数测试容易点。有超过3个可选参数参数导致一个爆炸式组合增长,你会有成吨独立参数情形要测试。

无参数是理想情况。1个或2个都可以,最好避免3个。再多旧需要加固了。通常如果你的函数有超过两个参数,说明他多做了一些事。 在参数少的情况里,大多数时候一个高级别对象(数组)作为参数就足够应付。

Bad:

function createMenu($title, $body, $buttonText, $cancellable) {
  // ...
}

Good:

class menuConfig() {
  public $title;
  public $body;
  public $buttonText;
  public $cancellable = false;
}

$config = new MenuConfig();
$config->title = 'Foo';
$config->body = 'Bar';
$config->buttonText = 'Baz';
$config->cancellable = true;

function createMenu(MenuConfig $config) {
  // ...
}

⬆ 返回顶部

函数应该只做一件事

这是迄今为止软件工程里最重要的一个规则。当函数做超过一件事的时候,他们就难于实现、测试和理解。当你隔离函数只剩一个功能时,他们就容易被重构,然后你的代码读起来就更清晰。如果你光遵循这条规则,你就领先于大多数开发者了。

Bad:

function emailClients($clients) {
  foreach ($clients as $client) {
    $clientRecord = $db->find($client);
    if($clientRecord->isActive()) {
       email($client);
    }
  }
}

Good:

function emailClients($clients) {
  $activeClients = activeClients($clients);
  array_walk($activeClients, 'email');
}

function activeClients($clients) {
  return array_filter($clients, 'isClientActive');
}

function isClientActive($client) {
  $clientRecord = $db->find($client);
  return $clientRecord->isActive();
}

⬆ 返回顶部

函数名应当描述他们所做的事

Bad:

function addToDate($date, $month) {
  // ...
}

$date = new \DateTime();

// It's hard to to tell from the function name what is added
addToDate($date, 1);

Good:

function addMonthToDate($month, $date) {
  // ...
}

$date = new \DateTime();
addMonthToDate(1, $date);

⬆ 返回顶部

函数应当只为一层抽象,当你超过一层抽象时,函数正在做多件事。拆分功能易达到可重用性和易用性。.

Bad:

function parseBetterJSAlternative($code) {
  $regexes = [
    // ...
  ];

  $statements = split(' ', $code);
  $tokens = [];
  foreach($regexes as $regex) {
    foreach($statements as $statement) {
      // ...
    }
  }
  
  $ast = [];
  foreach($tokens as $token) {
     // lex...
  }

  foreach($ast as $node) {
   // parse...
  }
}

Good:

function tokenize($code) {
  $regexes = [
    // ...
  ];

  $statements = split(' ', $code);
  $tokens = [];
  foreach($regexes as $regex) {
    foreach($statements as $statement) {
      $tokens[] = /* ... */;
    });
  });

  return tokens;
}

function lexer($tokens) {
  $ast = [];
  foreach($tokens as $token) {
    $ast[] = /* ... */;
  });

  return ast;
}

function parseBetterJSAlternative($code) {
  $tokens = tokenize($code);
  $ast = lexer($tokens);
  foreach($ast as $node) {
    // parse...
  });
}

⬆ 返回顶部

删除重复的代码

尽你最大的努力来避免重复的代码。重复代码不好,因为它意味着如果你修改一些逻辑,那就有不止一处地方要同步修改了。

想象一下如果你经营着一家餐厅并跟踪它的库存: 你全部的西红柿、洋葱、大蒜、香料等。如果你保留有多个列表,当你服务一个有着西红柿的菜,那么所有记录都得更新。如果你只有一个列表,那么只需要修改一个地方!

经常你容忍重复代码,因为你有两个或更多有共同部分但是少许差异的东西强制你用两个或更多独立的函数来做相同的事。移除重复代码意味着创造一个处理这组不同事物的一个抽象,只需要一个函数/模块/类。

抽象正确非常重要,这也是为什么你应当遵循SOLID原则(奠定Class基础的原则)。坏的抽象可能比重复代码还要糟,因为要小心。在这个前提下,如果你可以抽象好,那就开始做把!不要重复你自己,否则任何你想改变一件事的时候你都发现在即在更新维护多处。

Bad:

function showDeveloperList($developers) {
  foreach($developers as $developer) {
    $expectedSalary = $developer->calculateExpectedSalary();
    $experience = $developer->getExperience();
    $githubLink = $developer->getGithubLink();
    $data = [
      $expectedSalary,
      $experience,
      $githubLink
    ];

    render($data);
  }
}

function showManagerList($managers) {
  foreach($managers as $manager) {
    $expectedSalary = $manager->calculateExpectedSalary();
    $experience = $manager->getExperience();
    $githubLink = $manager->getGithubLink();
    $data = [
      $expectedSalary,
      $experience,
      $githubLink
    ];

    render($data);
  }
}

Good:

function showList($employees) {
  foreach($employees as $employe) {
    $expectedSalary = $employe->calculateExpectedSalary();
    $experience = $employe->getExperience();
    $githubLink = $employe->getGithubLink();
    $data = [
      $expectedSalary,
      $experience,
      $githubLink
    ];

    render($data);
  }
}

⬆ 返回顶部

通过对象赋值设置默认值

Bad:

$menuConfig = [
  'title'       => null,
  'body'        => 'Bar',
  'buttonText'  => null,
  'cancellable' => true,
];

function createMenu(&$config) {
  $config['title']       = $config['title'] ?: 'Foo';
  $config['body']        = $config['body'] ?: 'Bar';
  $config['buttonText']  = $config['buttonText'] ?: 'Baz';
  $config['cancellable'] = $config['cancellable'] ?: true;
}

createMenu($menuConfig);

Good:

$menuConfig = [
  'title'       => 'Order',
  // User did not include 'body' key
  'buttonText'  => 'Send',
  'cancellable' => true,
];

function createMenu(&$config) {
  $config = array_merge([
    'title'       => 'Foo',
    'body'        => 'Bar',
    'buttonText'  => 'Baz',
    'cancellable' => true,
  ], $config);

  // config now equals: {title: "Order", body: "Bar", buttonText: "Send", cancellable: true}
  // ...
}

createMenu($menuConfig);

⬆ 返回顶部

不要用标志作为函数的参数,标志告诉你的用户函数做很多事了。函数应当只做一件事。 根据布尔值区别的路径来拆分你的复杂函数。

Bad:

function createFile(name, temp = false) {
  if (temp) {
    touch('./temp/'.$name);
  } else {
    touch($name);
  }
}

Good:

function createFile($name) {
  touch(name);
}

function createTempFile($name) {
  touch('./temp/'.$name);
}

⬆ 返回顶部

避免副作用

一个函数做了比获取一个值然后返回另外一个值或值们会产生副作用如果。副作用可能是写入一个文件,修改某些全局变量或者偶然的把你全部的钱给了陌生人。

现在,你的确需要在一个程序或者场合里要有副作用,像之前的例子,你也许需要写一个文件。你想要做的是把你做这些的地方集中起来。不要用几个函数和类来写入一个特定的文件。用一个服务来做它,一个只有一个。

重点是避免常见陷阱比如对象间共享无结构的数据,使用可以写入任何的可变数据类型,不集中处理副作用发生的地方。如果你做了这些你就会比大多数程序员快乐。

Bad:

// Global variable referenced by following function.
// If we had another function that used this name, now it'd be an array and it could break it.
$name = 'Ryan McDermott';

function splitIntoFirstAndLastName() {
  $name = preg_split('/ /', $name);
}

splitIntoFirstAndLastName();

var_dump($name); // ['Ryan', 'McDermott'];

Good:

$name = 'Ryan McDermott';

function splitIntoFirstAndLastName($name) {
  return preg_split('/ /', $name);
}

$name = 'Ryan McDermott';
$newName = splitIntoFirstAndLastName(name);

var_export($name); // 'Ryan McDermott';
var_export($newName); // ['Ryan', 'McDermott'];

⬆ 返回顶部

不要写全局函数

在大多数语言中污染全局变量是一个坏的实践,因为你可能和其他类库冲突并且你api的用户不明白为什么直到他们获得产品的一个异常。让我们看一个例子:如果你想配置一个数组,你可能会写一个全局函数像config(),但是可能和试着做同样事的其他类库冲突。这就是为什么单例设计模式和简单配置会更好的原因。

Bad:

function config() {
  return  [
    'foo': 'bar',
  ]
};

Good:

class Configuration {
  private static $instance;
  private function __construct($configuration) {/* */}
  public static function getInstance() {
     if(self::$instance === null) {
         self::$instance = new Configuration();
     }
     return self::$instance;
 }
 public function get($key) {/* */}
 public function getAll() {/* */}
}

$singleton = Configuration::getInstance();

⬆ 返回顶部

封装条件语句

Bad:

if ($fsm->state === 'fetching' && is_empty($listNode)) {
  // ...
}

Good:

function shouldShowSpinner($fsm, $listNode) {
  return $fsm->state === 'fetching' && is_empty(listNode);
}

if (shouldShowSpinner($fsmInstance, $listNodeInstance)) {
  // ...
}

⬆ 返回顶部

避免消极条件

Bad:

function isDOMNodeNotPresent($node) {
  // ...
}

if (!isDOMNodeNotPresent($node)) {
  // ...
}

Good:

function isDOMNodePresent($node) {
  // ...
}

if (isDOMNodePresent($node)) {
  // ...
}

⬆ 返回顶部

避免条件声明

这看起来像一个不可能任务。当人们第一次听到这句话是都会这么说。 “没有一个if声明” 答案是你可以使用多态来达到许多case语句里的任务。第二个问题很常见, “那么为什么我要那么做?” 答案是前面我们学过的一个整洁代码原则:一个函数应当只做一件事。当你有类和函数有很多if声明,你自己知道你的函数做了不止一件事。记住,只做一件事。

Bad:

class Airplane {
  // ...
  public function getCruisingAltitude() {
    switch (this.type) {
      case '777':
        return $this->getMaxAltitude() - $this->getPassengerCount();
      case 'Air Force One':
        return $this->getMaxAltitude();
      case 'Cessna':
        return $this->getMaxAltitude() - $this->getFuelExpenditure();
    }
  }
}

Good:

class Airplane {
  // ...
}

class Boeing777 extends Airplane {
  // ...
  public function getCruisingAltitude() {
    return $this->getMaxAltitude() - $this->getPassengerCount();
  }
}

class AirForceOne extends Airplane {
  // ...
  public function getCruisingAltitude() {
    return $this->getMaxAltitude();
  }
}

class Cessna extends Airplane {
  // ...
  public function getCruisingAltitude() {
    return $this->getMaxAltitude() - $this->getFuelExpenditure();
  }
}

⬆ 返回顶部

Avoid 避免类型检查 (part 1)

PHP是弱类型的,这意味着你的函数可以接收任何类型的参数。 有时候你为这自由所痛苦并且在你的函数渐渐尝试类型检查。有很多方法去避免这么做。第一种是考虑API的一致性。

Bad:

function travelToTexas($vehicle) {
  if ($vehicle instanceof Bicycle) {
    $vehicle->peddle($this->currentLocation, new Location('texas'));
  } else if ($vehicle instanceof Car) {
    $vehicle->drive($this->currentLocation, new Location('texas'));
  }
}

Good:

function travelToTexas($vehicle) {
  $vehicle->move($this->currentLocation, new Location('texas'));
}

⬆ 返回顶部

避免类型检查 (part 2)

如果你正使用基本原始值比如字符串、整形和数组,你不能用多态,你仍然感觉需要类型检测,你应当考虑类型声明或者严格模式。 这给你了基于标准PHP语法的静态类型。 手动检查类型的问题是做好了需要好多的废话,好像为了安全就可以不顾损失可读性。保持你的PHP 整洁,写好测试,做好代码回顾。做不到就用PHP严格类型声明和严格模式来确保安全。

Bad:

function combine($val1, $val2) {
  if (is_numeric($val1) && is_numeric(val2)) {
    return val1 + val2;
  }

  throw new \Exception('Must be of type Number');
}

Good:

function combine(int $val1, int $val2) {
  return $val1 + $val2;
}

⬆ 返回顶部

移除僵尸代码

僵尸代码和重复代码一样坏。没有理由保留在你的代码库中。如果从来被调用过,见鬼去!在你的版本库里是如果你仍然需要他的话,因此这么做很安全。

Bad:

function oldRequestModule($url) {
  // ...
}

function newRequestModule($url) {
  // ...
}

$req = new newRequestModule();
inventoryTracker('apples', $req, 'www.inventory-awesome.io');

Good:

function newRequestModule($url) {
  // ...
}

$req = new newRequestModule();
inventoryTracker('apples', $req, 'www.inventory-awesome.io');

⬆ 返回顶部

##有问题反馈 在使用中有任何问题,欢迎反馈给我,可以用以下联系方式跟我交流

[转载]FTP连接报错530 Permission denied解决方法

虚拟机装好RedHat后,准备使用filezilla连接,输入IP地址,root用户,密码,快速连接,报错:

530 Permission denied。

故障排除:

1.首先检查系统是否开启了vsftp服务,如果没有开启,先开启该服务。

2.查看配置

vsftpd的配置,配置文件中限定了vsftpd用户连接控制配置。
vsftpd.ftpusers:位于/etc/vsftpd目录下。它指定了哪些用户账户不能访问FTP服务器,例如root等。
vsftpd.user_list:位于/etc/vsftpd目录下。该文件里的用户账户在默认情况下也不能访问FTP服务器,仅当vsftpd .conf配置文件里启用userlist_enable=NO选项时才允许访问。
vsftpd.conf:位于/etc/vsftpd目录下。来自定义用户登录控制、用户权限控制、超时设置、服务器功能选项、服务器性能选项、服务器响应消息等FTP服务器的配置。

3.配置修改完成后,执行service vsftpd restart重启vsftpd服务。(注:如果为centos7系统,重启服务命令为:/bin/systemctl start vsftpd.service)

[转载]php使用Header函数,PHP_AUTH_PW和PHP_AUTH_USER做用户验证

[这篇文章主要介绍了php使用Header函数,PHP_AUTH_PW和PHP_AUTH_USER做用户验证的方法,结合实例形式分析了PHP使用Header函数调用登录验证及PHP_AUTH_PW和PHP_AUTH_USER进行验证处理的相关技巧,需要的朋友可以参考下]

本文实例讲述了php使用Header函数,PHP_AUTH_PW和PHP_AUTH_USER做用户验证的方法。分享给大家供大家参考,具体如下:

在php中,可以使用Header函数做一些有趣的事情,用户验证就是其中一个很有意思的功能。具体用法:

Header("WWW-Authenticate: Basic realm="USER LOGIN"");
Header("HTTP/1.0 401 Unauthorized");

在页首设计这两个Header函数,页面在载入前会出现一个登录框,要求输入用户名和密码。习惯了在页面登录的我们,是否觉得这样的登录很原始,又很新奇呢?

为了获取从这个对话框中传来的用户名和密码,需要用到php提供的两个特殊变量$PHP_AUTH_USER和$PHP_AUTH_PW,要这样使用这两个特殊变量好像需要在php.ini中设置相关的选项,不然就只能像下面这样引用:

$_SERVER['PHP_AUTH_USER']
$_SERVER['PHP_AUTH_PW']

获取到用户提交上来的用户名和密码之后,要怎样处理逻辑就跟我们一般的程序处理没有什么区别了。下面提供两个例程供参考:

<?php
if(!isset($PHP_AUTH_USER)) {
Header("WWW-authenticate: basic realm="XXX"");
Header("HTTP/1.0 401 Unauthorized");
$title="Login Instructions";
?>
<blockquote>
In order to enter this section of the web site, you must be an XXX
subscriber. If you are a subscriber and you are having trouble logging
in,
please contact <a href="mailto:[email protected]">[email protected]</a>.
</blockquote>
<?php
exit;
} else {
mysql_pconnect("localhost","nobody","") or die("Unable to connect to SQL server");
mysql_select_db("xxx") or die("Unable to select database");
$user_id=strtolower($PHP_AUTH_USER);
$password=$PHP_AUTH_PW;
$query = mysql_query("select * from users where user_id='$user_id' and password='$password'");
if(!mysql_num_rows($query)) {
Header("WWW-authenticate: basic realm="XXX"");
Header("HTTP/1.0 401 Unauthorized");
$title="Login Instructions";
?>
<blockquote>
In order to enter this section of the web site, you must be an XXX
subscriber. If you are a subscriber and you are having trouble
logging in,
please contact <a href="mailto:[email protected]">[email protected]</a>.
</blockquote>
<?php
exit;
}
$name=mysql_result($query,0,"name");
$email=mysql_result($query,0,"email");
mysql_free_result($query);
}
?>

另外一个参考的例程:

<?php
//assume user is not authenticated
$auth = false;
$user = $_SERVER['PHP_AUTH_USER'];
$pass = $_SERVER['PHP_AUTH_PW'];
if ( isset($user) && isset($pass) )
{
//connect to db
include 'db_connect.php';
//SQL query to find if this entered username/password is in the db
$sql = "SELECT * FROM healthed_workshop_admin WHERE
user = '$PHP_AUTH_USER' AND
pass = '$PHP_AUTH_PW'";
//put the SQL command and SQL instructions into variable
$result = mysql_query($sql) or die('Unable to connect.');
//get number or rows in command; if more than 0, row is found
$num_matches = mysql_num_rows($result);
if ($num_matches !=0)
{
//matching row found authenticates user
$auth = true;
}
}
if (!$auth)
{
header('WWW-Authenticate: Basic realm="Health Ed Presentation Admin"');
header('HTTP/1.0 401 Unauthorized');
echo 'You must enter a valid username & password.';
exit;
}
else
{
echo 'Success!';
}
?>

扩展:

$_SERVER[‘PHP_AUTH_USER’]

       PHP 的 HTTP 认证机制仅在 PHP 以 Apache 模块方式运行时才有效,因此该功能不适用于 CGI 版本。在 Apache 模块的 PHP 脚本中,可以用 header() 函数来向客户端浏览器发送&ldquo;Authentication Required&rdquo;信息,使其弹出一个用户名/密码输入窗口。当用户输入用户名和密码后,包含有 URL 的 PHP 脚本将会再次和预定义变量 PHP_AUTH_USER、PHP_AUTH_PW 和 AUTH_TYPE 一起被调用,这三个变量分别被设定为用户名,密码和认证类型。预定义变量保存在 $_SERVER 或者 $HTTP_SERVER_VARS 数组中。系统仅支持“基本的”认证

 

复制代码
<?php
$authorization = false;
if($_SERVER['PHP_AUTH_USER'] == "admin" && $_SERVER['PHP_AUTH_PW'] == "admin888"){
    echo "login";
    $authorization = true;
    exit;
}
if(!$authorization){
    header("WWW-Authenticate:Basic realm='Private'");
    header('HTTP/1.0 401 Unauthorized');
    print "You are unauthorized to enter this area.";
}
?>

基于Python爬取转转30W二手交易数据

*爬取结果

1.获取二手商品分类Link

import requests
from bs4 import BeautifulSoup
import pymongo
client = pymongo.MongoClient('localhost',27017)
db_project = client['58project']
cateLinks = db_project['cateLinks']

def getCateLink():
    baseUrl = 'http://sz.58.com/sale.shtml'
    wbData = requests.get(baseUrl)
    soup = BeautifulSoup(wbData.text,'lxml')
    links = soup.select('ul.ym-submnu > li > b > a')
    return list(set('http://sz.58.com{}'.format(link.get('href')) for link in links))

links = getCateLink()
for link in links:
    cateLinks.insert_one({'cateLinks':link})

2.获取二手商品Link

import requests
from bs4 import BeautifulSoup
import time
import random
import pymongo
import re

client = pymongo.MongoClient('localhost', 27017)
db_project = client['58project']
goodsLinksSheet = db_project['new_goodsLinks']
errorNote = db_project['errorNote']


def getGoodsLink(cateLink, page):
    url = '{}pn{}/'.format(cateLink, str(page))
    wbData = requests.get(url)
    if not wbData:
        errorNote.insert_one({'cateLink':cateLink,'page':page})
        return False
    soup = BeautifulSoup(wbData.text, 'lxml')
    if not soup.find('div', 'noinfotishi'):
        time.sleep(random.randrange(2, 4))
        for link in soup.find_all('a', class_='t', href=re.compile('zhuanzhuan.58')):
            goodsLink = link.get('href').split('?')[0]
            goodsLinksSheet.insert_one({'goodsLink': goodsLink})
            print(cateLink,'|',page,'|', goodsLink)
    else:
        return False

3.获取二手商品详情信息并逐条插入MongoDB

import requests
from bs4 import BeautifulSoup
import time
from getMongoData import getData
import pymongo
import random
import re

client = pymongo.MongoClient('localhost', 27017)
projectDb = client['58project']
goodsInfo = projectDb['goodsInfo']
fiter = projectDb['fiter']


def getGoodsinfo(goodsLink):
    if goodsInfo.find({'goodsLink': goodsLink}).count() == 0:
        time.sleep(random.randrange(3, 5))
        wbData = requests.get(goodsLink)
        if wbData.status_code == 200:
            soup = BeautifulSoup(wbData.text, 'lxml')
            if soup.find('p',class_ = 'personal_name'):
                name = soup.find('p', class_='personal_name').get_text() if soup.find('p', class_='personal_name') else None
                join58Age = re.sub('\D', '', soup.find('p', class_='personal_chengjiu').get_text()) if soup.find('p',class_='personal_chengjiu') else 0
                orderNum = soup.find('span', class_='numdeal').get_text() if soup.find('span', class_='numdeal') else 0
                title = soup.find('h1', class_='info_titile').get_text() if soup.find('h1', class_='info_titile') else None
                viewTimes = re.sub('\D', '', soup.find('span', class_='look_time').get_text()) if soup.find('span', class_='look_time') else 0
                price = soup.select('.price_now  i')[0].get_text() if soup.select('.price_now  i') else 0
                address = soup.select('.palce_li  i')[0].get_text().split('-') if soup.select('.palce_li  i') else None
                bodyPic = list(map(lambda x: x.get('src'), soup.select('div.boby_pic > img'))) if soup.select(
                    'div.boby_pic > img') else None
                describe = soup.select('.baby_kuang p')[0].get_text() if soup.select('.baby_kuang p') else None
                headImgLink = soup.select('.personal_touxiang img')[0].get('src') if soup.select('.personal_touxiang img') else None
                bodyPic = '|'.join(bodyPic) if bodyPic != None else None
                data = {
                    'name': name,
                    'join58Age': int(join58Age) if join58Age != '' else 0,
                    'orderNum': int(orderNum),
                    'title': title,
                    'viewTimes': int(viewTimes) if viewTimes != '' else 0,
                    'price': int(price) if price.isdigit() else 0,
                    'address': address,
                    'describe': describe,
                    'headImgLink': headImgLink,
                    'bodyPic': bodyPic,
                    'goodsLink': goodsLink
                }
                goodsInfo.insert_one(data)
                fiter.insert_one({'url': goodsLink})
                print(data, '\n')
    else:
        print(goodsLink)

def deleteData():
    res = goodsInfo.delete_many({'name': ''})
    print(res)

def getLineNum():
    # goodsLink = getData('new_goodsLinks')
    # res = set(map(lambda x:x['goodsLink'],goodsLink))
    res = goodsInfo.distinct('goodsLink')
    print(len(res))

def repeat():
    links = goodsInfo.find()
    tmp = set()
    for link in links:
        if link['goodsLink'] not in tmp:
            tmp.add(link['goodsLink'])
        else:
            # goodsInfo.delete_one({'goodsLink':link['goodsLink']})
            print(link)

4.获取MongoDB已存储的数据

import pymongo

def getData(sheetName):
    client = pymongo.MongoClient('localhost',27017)
    projectDb = client['58project']
    sheetObj = projectDb[sheetName]
    return sheetObj.find()

5.执行入口主文件,多线程异步执行爬去任务

from getGoodsLinks import getGoodsLink
from multiprocessing import Pool
from getMongoData import getData
from getGoodsInfo import getGoodsinfo
from getGoodsInfo import deleteData
from getGoodsInfo import getLineNum
from getGoodsInfo import repeat


def startSpider(cateLink):
    for page in range(1, 101):
        if not getGoodsLink(cateLink, str(page)):
            continue


# if __name__ == '__main__':
#     pool = Pool(processes = 8)
#     links = getCateLink()
#     pool.map(startSpider,links)
# 
# if __name__ == '__main__':
#     goodsLink = getData('new_goodsLinks')
#     urlPond = list(map(lambda x:x['goodsLink'],goodsLink))
#     pool = Pool(processes=4)
#     pool.map(getGoodsinfo, urlPond)
# 
# if __name__ == '__main__':
#     getLineNum()

 

Mysql查询复习

-- 1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数
SELECT
  a.*,
  b.s_score AS 01_score,
  c.s_score AS 02_score
FROM
  student a
  JOIN score b ON a.s_id = b.s_id AND b.c_id = '01'
  LEFT JOIN score c ON a.s_id = c.s_id AND c.c_id = '02' OR c.c_id = NULL
WHERE b.s_score > c.s_score;

SELECT
  st.*,
  sc1.s_score AS 01_score,
  sc2.s_score AS 02_score
FROM score AS sc1, score AS sc2, student AS st
WHERE sc1.c_id = '01' AND (sc2.c_id = '02' OR sc2.c_id = NULL) AND sc1.s_id = sc2.s_id AND sc1.s_score > sc2.s_score AND
      sc1.s_id = st.s_id;

-- 2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数

SELECT
  a.*,
  b.s_score AS 01_score,
  c.s_score AS 02_score
FROM
  student a LEFT JOIN score b ON a.s_id = b.s_id AND b.c_id = '01' OR b.c_id = NULL
  JOIN score c ON a.s_id = c.s_id AND c.c_id = '02'
WHERE b.s_score < c.s_score;

SELECT
  st.*,
  sc1.s_score AS 01_score,
  sc2.s_score AS 02_score
FROM score AS sc1, score AS sc2, student AS st
WHERE sc1.c_id = '01' AND (sc2.c_id = '02' OR sc2.c_id = NULL) AND sc1.s_id = sc2.s_id AND sc1.s_score < sc2.s_score AND
      sc1.s_id = st.s_id;

-- 3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩
SELECT
  b.s_id,
  b.s_name,
  ROUND(AVG(a.s_score), 2) AS avg_score
FROM
  student b
  JOIN score a ON b.s_id = a.s_id
GROUP BY b.s_id, b.s_name
HAVING ROUND(AVG(a.s_score), 2) >= 60;

SELECT
  st.*,
  round(avg(s_score), 2) AS avg_score
FROM score AS sc, student AS st
WHERE sc.s_id = st.s_id
GROUP BY s_id
HAVING avg(s_score) >= 60;

-- 4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩
-- (包括有成绩的和无成绩的)

SELECT
  b.s_id,
  b.s_name,
  ROUND(AVG(a.s_score), 2) AS avg_score
FROM
  student b
  LEFT JOIN score a ON b.s_id = a.s_id
GROUP BY b.s_id, b.s_name
HAVING ROUND(AVG(a.s_score), 2) < 60
UNION
SELECT
  a.s_id,
  a.s_name,
  0 AS avg_score
FROM
  student a
WHERE a.s_id NOT IN (
  SELECT DISTINCT s_id
  FROM score);

-- 5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩
SELECT
  a.s_id,
  a.s_name,
  count(b.c_id)  AS sum_course,
  sum(b.s_score) AS sum_score
FROM
  student a
  LEFT JOIN score b ON a.s_id = b.s_id
GROUP BY a.s_id, a.s_name;


SELECT
  st.s_id,
  st.s_name,
  count(sc.c_id),
  sum(sc.s_score)
FROM student AS st LEFT JOIN score AS sc ON st.s_id = sc.s_id
GROUP BY st.s_id;

-- 6、查询"李"姓老师的数量
SELECT count(*)
FROM teacher
WHERE t_name LIKE '李%';

-- 7、查询学过"张三"老师授课的同学的信息
SELECT a.*
FROM
  student a
  JOIN score b ON a.s_id = b.s_id
WHERE b.c_id IN (
  SELECT c_id
  FROM course
  WHERE t_id = (
    SELECT t_id
    FROM teacher
    WHERE t_name = '张三'));

SELECT *
FROM student AS st
WHERE st.s_id IN (SELECT sc.s_id
                  FROM score AS sc
                  WHERE sc.c_id IN (SELECT co.c_id
                                    FROM course AS co
                                    WHERE co.t_id IN (SELECT t_id
                                                      FROM teacher AS te
                                                      WHERE te.t_name = '张三')));

-- 8、查询没学过"张三"老师授课的同学的信息
SELECT *
FROM
  student c
WHERE c.s_id NOT IN (
  SELECT a.s_id
  FROM student a
    JOIN score b ON a.s_id = b.s_id
  WHERE b.c_id IN (
    SELECT c_id
    FROM course
    WHERE t_id = (
      SELECT t_id
      FROM teacher
      WHERE t_name = '张三')));

SELECT *
FROM student AS st
WHERE st.s_id NOT IN (SELECT sc.s_id
                      FROM score AS sc
                      WHERE sc.c_id IN (SELECT co.c_id
                                        FROM course AS co
                                        WHERE co.t_id IN (SELECT t_id
                                                          FROM teacher AS te
                                                          WHERE te.t_name = '张三')));
-- 9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息

SELECT a.*
FROM
  student a, score b, score c
WHERE a.s_id = b.s_id AND a.s_id = c.s_id AND b.c_id = '01' AND c.c_id = '02';

SELECT st.*
FROM score sc_1, score sc_2, student st
WHERE sc_1.s_id = sc_2.s_id AND sc_1.c_id = '01' AND sc_2.c_id = '02' AND sc_1.s_id = st.s_id;

-- 10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息

SELECT a.*
FROM
  student a
WHERE a.s_id IN (SELECT s_id
                 FROM score
                 WHERE c_id = '01') AND a.s_id NOT IN (SELECT s_id
                                                       FROM score);

SELECT st.*
FROM score sc_1, score sc_2, student st
WHERE sc_1.s_id = sc_2.s_id AND sc_1.c_id = '01' AND sc_2.c_id = '02' AND sc_1.s_id = st.s_id;

-- 11、查询没有学全所有课程的同学的信息
SELECT s.*
FROM
  student s
WHERE s.s_id IN (
  SELECT s_id
  FROM score
  WHERE s_id NOT IN (
    SELECT a.s_id
    FROM score a
      JOIN score b ON a.s_id = b.s_id AND b.c_id = '02'
      JOIN score c ON a.s_id = c.s_id AND c.c_id = '03'
    WHERE a.c_id = '01'));

SELECT st.*
FROM student st
WHERE st.s_id IN (
  SELECT sc.s_id
  FROM score sc
  WHERE sc.s_id NOT IN (
    SELECT sc.s_id
    FROM score sc
    GROUP BY sc.s_id
    HAVING count(*) >= (SELECT count(*)
                        FROM course)));

-- 12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息
SELECT *
FROM student
WHERE s_id IN (
  SELECT DISTINCT a.s_id
  FROM score a
  WHERE a.c_id IN (SELECT a.c_id
                   FROM score a
                   WHERE a.s_id = '01')
);


SELECT st.*
FROM student st
WHERE st.s_id IN (SELECT DISTINCT s_id
                  FROM score
                  WHERE c_id IN (SELECT sc.c_id
                                 FROM score sc
                                 WHERE sc.s_id = '01'));

-- 13、查询和"01"号的同学学习的课程完全相同的其他同学的信息
SELECT a.*
FROM student a
WHERE a.s_id IN (
  SELECT DISTINCT s_id
  FROM score
  WHERE s_id != '01' AND c_id IN (SELECT c_id
                                  FROM score
                                  WHERE s_id = '01')
  GROUP BY s_id
  HAVING count(1) = (SELECT count(1)
                     FROM score
                     WHERE s_id = '01'));


SELECT st.*
FROM student st
WHERE st.s_id != '01' AND st.s_id IN (
  SELECT s_id
  FROM score
  WHERE c_id IN (SELECT c_id
                 FROM score
                 WHERE s_id = '01')
  GROUP BY s_id
  HAVING count(1) = (SELECT count(1)
                     FROM score
                     WHERE s_id = '01'));

-- 14、查询没学过"张三"老师讲授的任一门课程的学生姓名
SELECT a.s_name
FROM student a
WHERE a.s_id NOT IN (
  SELECT s_id
  FROM score
  WHERE c_id =
        (SELECT c_id
         FROM course
         WHERE t_id = (
           SELECT t_id
           FROM teacher
           WHERE t_name = '张三'))
  GROUP BY s_id);

SELECT s_name
FROM student
WHERE s_id NOT IN (
  SELECT s_id
  FROM score
  WHERE c_id = (SELECT c_id
                FROM teacher te, course co
                WHERE te.t_name = '张三' AND co.t_id = te.t_id));




数据过滤应该在SQL层和应用层的选择

实际开发中,经常性的从数据库读取数据,但是数据库中的数据并不全是业务逻辑实际需要的,这时,有两个选择,一个是在构建SQL查询语句时就写where子句进行过滤,另一个是将数据悉数读取出来,在客户机利用代码对返回数据进行过滤。

如果数据量较小,此两者方式差距不大。尽管关系型数据库的逻辑处理性能偏弱,但在数据行数较多的情况下,在SQL层查询时过滤往往能有效缩短客户机的请求时间。

经自己测试,应用层利用代码处理数据会极大影响应用的性能,并且造成代码所处的那处业务逻辑不具备可伸缩性,服务器也不得不发送多余的数据,导致网络宽带的浪费。