Python常用的模块&包

2020-05-10 Python系列 0 评论字数统计: 10.2k(字) 阅读时长: 44(分)

常用模块&包

re模块

正则表达式简介

概念
又称正则表示式、正则表示法、规则表达式、常规表示法，是计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列符合某个句法规则的字符串。
字符组
在同一个位置可能出现的各种字符组成了一个字符组，在正则表达式中用[]表示。

正则表达式	待匹配字符串	解释
[0123456789]	6	字符组里包含待匹配字符
[abcdefghij]	a	字符组里包含待匹配字符
[0-9]	a	字符组里使用-作为范围，包含待匹配字符
[a-z]	a	字符组里使用-作为范围，包含待匹配字符
[A-Za-z0-9]	a	字符组里使用-作为范围，可以包含多个范围，包含待匹配字符

字符

元字符	解释
.	可以匹配除换行符(\n)外的任意字符
\w	可以匹配数字、字母、下划线(_)
\s	可以匹配任何空白字符，包括空格、制表符、换页符等等。等价于字符组用法的：[\f\n\r\t\v]
\d	可以匹配数字
\n	可以匹配一个换行符
\t	可以匹配一个制表符
\b	定位符,可以匹配一个单词的边界,这个位置的一侧是构成单词的字符,另一侧为非单词字符、字符串的开始或结束位置,”\b”是零宽度的
^	可以匹配字符串的开始
$	可以匹配字符串的结尾
\W	可以匹配非字母或数字或下划线
\D	可以匹配非数字
\S	可以匹配非空白符
\B	定位符,可以匹配一个非单词的边界
a\|b	可以匹配字符a或字符b
()	可以匹配括号内的表达式，也表示一个组
[…]	可以匹配字符组中的字符
[^…]	可以匹配除了字符组中字符的所有字符

量词

量词	解释
*	重复零次或更多次
+	重复一次或更多次
?	重复零次或一次
{n}	重复n次
{n,}	重复n次或更多次
{n,m}	重复n到m次

转义符\与r

字符	解释
`\`	元字符中很多都包括`\`，如果需要匹配反斜杠`\`的话，则需要对其转义，在`\`前增加一个`\`变为`\\`
r	如果需要转义的`\`过多，或者觉得麻烦，可以在正则表达式中使用r，则正则表达式字符串中的`\`视为`\`，不当做元字符的一部分

贪婪匹配原则
在满足匹配条件时，匹配尽可能长的字符串，默认情况下，采用贪婪匹配

re模块介绍

re模块使Python拥有全部的正则表达式功能,re模块也提供了与这些方法功能完全一致的函数，这些函数使用一个模式字符串做为它们的第一个参数

re模块常用方法

findall(): 返回所有满足匹配条件的结果至列表中
search(): 返回第一个包含匹配条件的信息对象，可以调用该对象的group()方法将匹配的字符输出，如果没有匹配到则调用group()方法会报错
match(): 与search()方法类似，但是仅在字符串开始出匹配，返回第一个包含匹配条件的信息对象，调用该对象的group()方法将匹配的字符输出，如果没有匹配到则调用group()方法会报错
split(): 逐个按符合条件的字符串对待匹配字符串进行切分，返回结果列表
sub(): 将待匹配字符串中满足匹配条件的内容进行替换，最后一个参数指定替换的次数，返回替换后的字符串
subn(): 与sub()方法类型，但是无法指定替换的次数，并且输出的是一个元组，包括替换后的字符串和替换的总次数
compile(): 将正则表达式编译成为一个正则表达式对象，之后可以用re模块中的方法对对象进行操作
finditer(): finditer返回一个存放匹配结果的迭代器，用next等方法取出存放结果的对象，再用group()方法取出结果
findall的优先级查询: 将findall()方法中匹配条件中加括号后，只会输出匹配到的字符串；可以在括号中加入?:取消权限
split的优先级查询: 将split()方法中的匹配条件加括号后，会将满足条件的切分字符也输出值最终列表中，同样可以在括号中加入?:取消权限

# 示例

import re

s = 'life1is2short,I3use4python5' # 待匹配字符串

res = re.findall("\d", s)
print(res)
>>> ['1', '2', '3', '4', '5']


res = re.search("1", s)
print(res.group())
>>> "1"


res = re.match("1", s)
print(res)
>>> None


res = re.split("\d", s)
print(res)
>>> ['life', 'is', 'short,I', 'use', 'python', '']


res = re.sub("\d", "___", s, 2)
print(res)
>>> "life___is___short,I3use4python5"


res = re.subn("\d", "___", s)
print(res)
>>> "life___is___short,I___use___python___"


prog = re.compile("\d")
res = prog.match(s)
# 等价于 res = re.match("\d", s)
print(res)
>>> ['1', '2', '3', '4', '5']


res = re.finditer("\d", s)
print([i.group() for i in res]) # 返回为一个迭代器 iterator 保存了 匹配对象 
>>> ['1', '2', '3', '4', '5']

collections模块

collections模块介绍

这个模块实现了特定目标的容器，以提供Python标准内建容器 dict , list , set , 和 tuple 的替代选择。

collections模块常用方法

名称	作用
namedtuple()	创建命名元组子类的工厂函数
deque	类似列表(list)的容器，实现了在两端快速添加(append)和弹出(pop)
ChainMap	类似字典(dict)的容器类，将多个映射集合到一个视图里面
Counter	字典的子类，提供了可哈希对象的计数功能
OrderedDict	字典的子类，保存了他们被添加的顺序
defaultdict	字典的子类，提供了一个工厂函数，为字典查询提供一个默认值
UserDict	封装了字典对象，简化了字典子类化
UserList	封装了列表对象，简化了列表子类化
UserString	封装了列表对象，简化了字符串子类化


# -------------------
# namedtuple
# -------------------

这个新的子类用于创建类元组的对象，可以通过域名来获取属性值，同样也可以通过索引和迭代获取值。
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)
res = p[0] + p[1]
print(res)

# 输出
>>> 33

# -------------------
# deque
# -------------------
返回一个新的双向队列对象，从左到右初始化(用方法 append()) ，从迭代对象数据创建。

>>> from collections import deque
>>> d = deque('ghi') # 创建一个新的deque
>>> for elem in d: # 循环输出deque中的元素
... print(elem.upper())
G
H
I

>>> d.append('j') # 从右边插入新元素
>>> d.appendleft('f') # 从左边插入新元素
>>> d # 输出deque所有元素
deque(['f', 'g', 'h', 'i', 'j'])

>>> d.pop() # 从右边推出最后一个元素
'j'
>>> d.popleft() # 从左边推出最后一个元素
'f'
>>> list(d) # 输出deque所有元素
['g', 'h', 'i']
>>> d[0] # 输出最左边元素
'g'
>>> d[-1] # 输出最右边元素
'i'

>>> list(reversed(d)) # 逆向输出deque所有元素
['i', 'h', 'g']
>>> 'h' in d # 在deque中查找元素
True
>>> d.extend('jkl') # 一次性添加多个元素
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])
>>> d.rotate(1) # 所有元素向右移动
>>> d
deque(['l', 'g', 'h', 'i', 'j', 'k'])
>>> d.rotate(-1) # 所有元素向左移动
>>> d
deque(['g', 'h', 'i', 'j', 'k', 'l'])

>>> deque(reversed(d)) # 创建一个新的deque
deque(['l', 'k', 'j', 'i', 'h', 'g'])
>>> d.clear() # 清空deque
>>> d.pop() # 空deque不能推出元素
Traceback (most recent call last):
    File "<pyshell#6>", line 1, in -toplevel-
        d.pop()
IndexError: pop from an empty deque

>>> d.extendleft('abc') # 一次性添加多个元素，从左边开始插入
>>> d
deque(['c', 'b', 'a'])

# -------------------
# ChainMap
# -------------------
将多个字典或者其他映射组合在一起，创建一个单独的可更新的视图。

from collections import ChainMap
baseline = {'music': 'bach', 'art': 'rembrandt'}
adjustments = {'art': 'van gogh', 'opera': 'carmen'}
print(list(ChainMap(adjustments, baseline)))

# 输出
>>> ['music', 'opera', 'art']

from collections import ChainMap
baseline = {'music': 'bach', 'art': 'rembrandt'}
adjustments = {'art': 'van gogh', 'opera': 'carmen'}
combined = baseline.copy()
combined.update(adjustments)
print(list(combined))

# 输出
>>> ['music', 'art', 'opera']


# -------------------
# Counter:
# -------------------
是dict的子类，用于计数可哈希对象。它是一个集合，元素像字典键(key)一样存储，它们的计数存储为值。

c = Counter() # 创建新的空Counter对象
c = Counter('gallahad') # 创建字符串的Counter对象
c = Counter({'red': 4, 'blue': 2}) # 创建映射关系的Counter对象
c = Counter(cats=4, dogs=8) # 创建关键字的Counter对象

# -------------------
# OrderedDict
# -------------------

它具有专门用于重新排列字典顺序的方法。

>>> d = OrderedDict.fromkeys('abcde')
>>> d.move_to_end('b')
>>> ''.join(d.keys())
'acdeb'
>>> d.move_to_end('b', last=False)
>>> ''.join(d.keys())
'bacde'


# -------------------
# defaultdict
# ------------------- 
一个新的类似字典的对象。它重载了一个方法并添加了一个可写的实例变量。

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> sorted(d.items())
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]


# -------------------
# UserDict
# ------------------- 
模拟一个字典类。这个实例的内容保存为一个正常字典， 可以通过 UserDict 实例的 data 属性存取。


# -------------------
# UserList
# ------------------- 
这个类封装了列表对象。它是一个有用的基础类，对于你想自定义的类似列表的类，可以继承和覆盖现有的方法，也可以添加新的方法。这样我们可以对列表添加新的行为。


# -------------------
# UserString
# ------------------- 
用作字符串对象的外包装。对这个类的需求已部分由直接创建 str 的子类的功能所替代

时间模块

表示时间的三种方式

时间戳：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。

import time
print(time.time())

>>> 1515570662.77503

结构化时间：结构化时间元组共有9个元素(年，月，日，时，分，秒，一年中第几周，一年中第几天等）

import time
print(time.localtime())

>>> time.struct_time(tm_year=2018, tm_mon=1, tm_mday=10, tm_hour=16, tm_min=0, tm_sec=33, tm_wday=2, tm_yday=10, tm_isdst=0)

时间字符串：以字符串表示年、月、日、时间

import time
print(time.strftime("%Y-%m-%d %H-%M-%S"))

>>> 2018-01-10 15-59-16

几种格式之间的转换

时间戳–>结构化时间

# time.gmtime(时间戳) #UTC时间，与英国伦敦当地时间一致
# time.localtime(时间戳) #当地时间。例如当地时间为北京时间，与UTC时间相差8小时，UTC时间+8小时 = 北京时间

import time

timestamp = 1515570662.77503

# 时间戳 → 当地时间
print(time.localtime(timestamp))
>>> time.struct_time(tm_year=2018, tm_mon=1, tm_mday=10, tm_hour=15, tm_min=51, tm_sec=2, tm_wday=2, tm_yday=10, tm_isdst=0)

# 时间戳 → UTC时间
print(time.gmtime(timestamp))
>>> time.struct_time(tm_year=2018, tm_mon=1, tm_mday=10, tm_hour=7, tm_min=51, tm_sec=2, tm_wday=2, tm_yday=10, tm_isdst=0)

结构化时间–>时间戳

# time.mktime(结构化时间)

import time
time_tuple  = time.localtime(1500000000)
print(time.mktime(time_tuple))

>>> 1500000000.0

字符串时间–>结构化时间

# time.strptime(时间字符串,字符串对应格式)

import time
print(time.strftime("%Y-%m-%d %X"))
>>> 2018-01-10 16:19:46
print(time.strftime("%Y-%m-%d",time.localtime(1515570662.77503)))

>>> 2018-01-10

结构化时间–>字符串时间

# time.strftime(“格式定义”,”结构化时间”) 结构化时间参数若不传，则现实当前时间

import time
print(time.strptime("2017-03-16","%Y-%m-%d"))

>>> time.struct_time(tm_year=2017, tm_mon=3, tm_mday=16, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=75, tm_isdst=-1)

print(time.strptime("07/24/2017","%m/%d/%Y"))

>>> time.struct_time(tm_year=2017, tm_mon=7, tm_mday=24, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=205, tm_isdst=-1)

random模块

random模块介绍

该模块实现了各种分布的伪随机数生成器。

random模块常用方法

随机小数

import random
print(random.random()) # 随机输出在0-1之间的小数
>>> 0.11828833626857149

import random
print(random.uniform(1,5)) # 随机输出在范围之间的小数
>>> 2.164732131520036

随机整数

import random
print(random.randint(5,10)) # randint中的范围包括首尾
>>> 8

import random
print(random.randrange(5,10,2)) # randrange中的范围顾首不顾尾，可以指定步长
>>> 7

随机返回列表元素

1
2
3

import random
print(random.choice([1,'23',[4,5],(6,7)]))
>>> (6, 7)

随机返回多个列表元素，可以指定返回的个数

1
2
3

import random
print(random.sample([1,'23',[4,5],(6,7)],2)) 
>>> [(6, 7), 1]

打乱顺序

import random
item=[1,2,3,4,5,6,7,8,9]
random.shuffle(item)
print(item)
>>> [2, 6, 8, 3, 5, 4, 7, 1, 9]

os模块

os模块介绍

本模块提供了一种使用与操作系统相关的功能的便捷式途径。

os模块常用方法

进程参数: 这些函数和数据项提供了操作当前进程和用户的信息。

os.ctermid(): 返回与进程控制终端对应的文件名。
os.environ: 一个表示字符串环境的 mapping 对象。
os.environb:字节版本的 environ: 一个以字节串表示环境的 mapping 对象。
os.fsencode(filename)：编码 路径类 文件名 为文件系统接受的形式，使用 'surrogateescape' 代理转义编码错误处理器，在Windows系统上会使用 'strict' ；返回 bytes 字节类型不变。
os.fsdecode(filename)：从文件系统编码方式解码为 路径类 文件名，使用 'surrogateescape' 代理转义编码错误处理器，在Windows系统上会使用 'strict' ；返回 str 字符串不变。
os.fspath(path):返回路径的文件系统表示。
os.getenv(key, default=None):如果存在，返回环境变量 key 的值，否则返回 default。 key ， default 和返回值均为 str 字符串类型。
os.getenvb(key, default=None):如果存在环境变量 key 那么返回其值，否则返回 default。 key ， default 和返回值均为bytes字节串类型。
os.get_exec_path(env=None):返回将用于搜索可执行文件的目录列表，与在外壳程序中启动一个进程时相似。
os.getegid():返回当前进程的有效组ID。
os.geteuid():返回当前进程的有效用户ID。
os.getgid():返回当前进程的实际组ID。
os.getgrouplist(user, group):返回该用户所在的组 ID 列表。
os.getgroups(): 返回当前进程对应的组ID列表
os.getlogin():返回通过控制终端进程进行登录的用户名。
os.getpgid(pid):根据进程id pid 返回进程的组 ID 列表。
os.getpgrp():返回当时进程组的ID
os.getpid():返回当前进程ID
os.getppid():返回父进程ID。
os.getpriority(which, who):获取程序调度优先级。
os.getresuid():返回一个由 (ruid, euid, suid) 所组成的元组，分别表示当前进程的真实用户ID，有效用户ID和甲暂存用户ID。
os.getresgid():返回一个由 (rgid, egid, sgid) 所组成的元组，分别表示当前进程的真实组ID，有效组ID和暂存组ID。
os.getuid():返回当前进程的真实用户ID。
os.initgroups(username, gid):调用系统 initgroups()，使用指定用户所在的所有值来初始化组访问列表，包括指定的组ID
os.putenv(key, value):将名为 key 的环境变量值设置为 value。
os.setegid(egid):设置当前进程的有效组ID。
os.seteuid(euid):设置当前进程的有效用户ID。
os.setgid(gid):设置当前进程的组ID。
os.setgroups(groups):将 group 参数值设置为与当进程相关联的附加组ID列表
os.setpgrp():根据已实现的版本（如果有）来调用系统 setpgrp() 或 setpgrp(0, 0) 。
os.setpgid(pid, pgrp):使用系统调用 setpgid()，将 pid 对应进程的组ID设置为 pgrp。
os.setpriority(which, who, priority):设置程序调度优先级。
os.setregid(rgid, egid):设置当前进程的真实和有效组ID。
os.setresgid(rgid, egid, sgid):设置当前进程的真实，有效和暂存组ID。
os.setresuid(ruid, euid, suid):设置当前进程的真实，有效和暂存用户ID。
os.setreuid(ruid, euid):设置当前进程的真实和有效用户ID。
os.getsid(pid):调用系统调用 getsid()。
os.setsid():使用系统调用 getsid()。
os.setuid(uid):设置当前进程的用户ID。
os.strerror(code):根据 code 中的错误码返回错误消息。
os.supports_bytes_environ:如果操作系统上原生环境类型是字节型则为 True (例如在 Windows 上为 False)。
os.umask(mask):设定当前数值掩码并返回之前的掩码。
os.uname():返回当前操作系统的识别信息。
os.unsetenv(key):取消设置（删除）名为 key 的环境变量。

创建文件对象: 这些函数创建新的 file objects。

1	os.fdopen(fd, args, *kwargs)：返回打开文件描述符 fd 对应文件的对象。

文件描述符操作: 这些函数对文件描述符所引用的 I/O 流进行操作。

 os.close(fd):关闭文件描述符 fd
 os.closerange(fd_low, fd_high):关闭从 fd_low （包括）到 fd_high （排除）间的文件描述符，并忽略错误。
 os.copy_file_range(src, dst, count, offset_src=None, offset_dst=None):从文件描述符 src 复制 count 字节，从偏移量 offset_src 开始读取，到文件描述符 dst，从偏移量 offset_dst 开始写入。
 os.device_encoding(fd):如果连接到终端，则返回一个与 fd 关联的设备描述字符，否则返回 None。
 os.dup(fd):返回一个文件描述符 fd 的副本。
 os.dup2(fd, fd2, inheritable=True):把文件描述符 fd 复制为 fd2，必要时先关闭后者。
 os.fchmod(fd, mode):将 fd 指定文件的权限状态修改为 mode。
 os.fchown(fd, uid, gid):分别将 fd 指定文件的所有者和组 ID 修改为 uid 和 gid 的值。
 os.fdatasync(fd):强制将文件描述符 fd 指定文件写入磁盘。
 os.fpathconf(fd, name):返回与打开的文件有关的系统配置信息。
 os.fstat(fd):获取文件描述符 fd 的状态. 返回一个 stat_result 对象。
 os.fstatvfs(fd):返回文件系统的信息，该文件系统是文件描述符 fd 指向的文件所在的文件系统，与 statvfs() 一样。从 Python 3.3 开始，它等效于 os.statvfs(fd)。
 os.fsync(fd):强制将文件描述符 fd 指向的文件写入磁盘。
 os.ftruncate(fd, length):将文件描述符 fd 指向的文件切分开，以使其最大为 length 字节。从 Python 3.3 开始，它等效于 os.truncate(fd, length)。
 os.get_blocking(fd):获取文件描述符的阻塞模式：如果设置了 O_NONBLOCK 标志位，返回 False，如果该标志位被清除，返回 True。
 os.isatty(fd):如果文件描述符 fd 打开且已连接至 tty 设备（或类 tty 设备），返回 True，否则返回 False。
 os.lockf(fd, cmd, len):在打开的文件描述符上，使用、测试或删除 POSIX 锁。
 os.lseek(fd, pos, how):将文件描述符 fd 的当前位置设置为 pos，
 os.open(path, flags, mode=0o777, *, dir_fd=None):打开文件 path，根据 flags 设置各种标志位，并根据 mode 设置其权限模式。
 os.openpty():打开一对新的伪终端，返回一对文件描述符``（主，从）``，分别为 pty 和 tty。
 os.pipe():创建一个管道，返回一对分别用于读取和写入的文件描述符 (r, w)。
 os.pipe2(flags):创建带有 flags 标志位的管道。
 os.posix_fallocate(fd, offset, len):确保为 fd 指向的文件分配了足够的磁盘空间，该空间从偏移量 offset 开始，到 len 字节为止。
 os.posix_fadvise(fd, offset, len, advice):声明即将以特定模式访问数据，使内核可以提前进行优化。
 os.pread(fd, n, offset):从文件描述符 fd 所指向文件的偏移位置 offset 开始，读取至多 n 个字节，而保持文件偏移量不变。
 os.preadv(fd, buffers, offset, flags=0):从文件描述符 fd 所指向文件的偏移位置 offset 开始，将数据读取至可变 字节类对象 缓冲区 buffers 中，保持文件偏移量不变。
 os.RWF_NOWAIT:不要等待无法立即获得的数据。
 os.RWF_HIPRI:高优先级读/写。
 os.pwrite(fd, str, offset):将 str 中的字节串 (bytestring) 写入文件描述符 fd 的偏移位置 offset 处，保持文件偏移量不变。
 os.pwritev(fd, buffers, offset, flags=0):将缓冲区 buffers 的内容写入文件描述符 fd 的偏移位置 offset 处，保持文件偏移量不变。
 os.RWF_DSYNC:提供立即写入功能，等效于 O_DSYNC open(2) 标志
 os.RWF_SYNC:提供立即写入功能，等效于 O_SYNC open(2) 标志。
 os.read(fd, n):从文件描述符 fd 中读取至多 n 个字节。
 os.sendfile(out, in, offset, count, [headers, ][trailers, ]flags=0):将文件描述符 in 中的 count 字节复制到文件描述符 out 的偏移位置 offset 处。返回复制的字节数，如果到达 EOF，返回 0。
 os.set_blocking(fd, blocking):设置指定文件描述符的阻塞模式：如果 blocking 为 False，则为该描述符设置 O_NONBLOCK 标志位，反之则清除该标志位。
 os.readv(fd, buffers):从文件描述符 fd 将数据读取至多个可变的 字节类对象 缓冲区 buffers 中。
 os.tcgetpgrp(fd):返回与 fd 指定的终端相关联的进程组（fd 是由 os.open() 返回的已打开的文件描述符）
 os.tcsetpgrp(fd, pg):设置与 fd 指定的终端相关联的进程组为 pg*（*fd 是由 os.open() 返回的已打开的文件描述符）。
 os.ttyname(fd):返回一个字符串，该字符串表示与文件描述符 fd 关联的终端。
 os.write(fd, str):将 str 中的字节串 (bytestring) 写入文件描述符 fd。
 os.writev(fd, buffers):将缓冲区 buffers 的内容写入文件描述符 fd。

# 查询终端的尺寸
 os.get_terminal_size(fd=STDOUT_FILENO):返回终端窗口的尺寸，格式为 (columns, lines)，它是类型为 terminal_size 的元组。
 os.terminal_size:元组的子类，存储终端窗口尺寸 (columns, lines)。

# 文件描述符的继承
 os.get_inheritable(fd):获取指定文件描述符的“可继承”标志位（为布尔值）。
 os.set_inheritable(fd, inheritable):设置指定文件描述符的“可继承”标志位。
 os.get_handle_inheritable(handle):获取指定句柄的“可继承”标志位（为布尔值）。
 os.set_handle_inheritable(handle, inheritable):设置指定句柄的“可继承”标志位。

os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True):使用 实际用户ID/用户组ID 测试对 path 的访问。
os.chdir(path):将当前工作目录更改为 path。
os.chflags(path, flags, *, follow_symlinks=True):将 path 的 flags 设置为其他由数字表示的 flags。
os.chmod(path, mode, *, dir_fd=None, follow_symlinks=True):将 path 的 mode 更改为其他由数字表示的 mode。 
os.chown(path, uid, gid, *, dir_fd=None, follow_symlinks=True):将 path 的用户和组 ID 分别修改为数字形式的 uid 和 gid。 
os.chroot(path):将当前进程的根目录更改为 path。 
os.fchdir(fd):将当前工作目录更改为文件描述符 fd 指向的目录。fd 必须指向打开的目录而非文件。从 Python 3.3 开始，它等效于 os.chdir(fd)。
os.getcwd():返回表示当前工作目录的字符串。
os.getcwdb():返回表示当前工作目录的字节串 (bytestring)。
os.lchflags(path, flags):将 path 的 flags 设置为其他由数字表示的 flags，与 chflags() 类似，但不跟踪符号链接。
os.lchmod(path, mode):将 path 的权限状态修改为 mode。
os.lchown(path, uid, gid):将 path 的用户和组 ID 分别修改为数字形式的 uid 和 gid，本函数不跟踪符号链接。从 Python 3.3 开始，它等效于 os.chown(path, uid, gid, follow_symlinks=False)。
os.link(src, dst, *, src_dir_fd=None, dst_dir_fd=None, follow_symlinks=True):创建一个指向 src 的硬链接，名为 dst。
os.listdir(path='.'):返回一个列表，该列表包含了 path 中所有文件与目录的名称。
os.lstat(path, *, dir_fd=None)：在给定路径上执行本函数，其操作相当于 lstat() 系统调用，类似于 stat() 但不跟踪符号链接。返回值是 stat_result 对象。
os.mkdir(path, mode=0o777, *, dir_fd=None):创建目录path，并使用数字定义模式。
os.makedirs(name, mode=0o777, exist_ok=False):Recursive directory creation function. Like mkdir(), but makes all intermediate-level directories needed to contain the leaf directory.
os.rmdir(path, *, dir_fd=None):Remove (delete) the directory path.
os.stat(path, *, dir_fd=None, follow_symlinks=True): Get the status of a file or a file descriptor.
os.walk(top, topdown=True, onerror=None, followlinks=False):Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

进程管理

os.abort():Generate a SIGABRT signal to the current process.
os.add_dll_directory(path):Add a path to the DLL search path.

These functions all execute a new program, replacing the current process; they do not return.
   os.execl(path, arg0, arg1, ...)
   os.execle(path, arg0, arg1, ..., env)
   os.execlp(file, arg0, arg1, ...)
   os.execlpe(file, arg0, arg1, ..., env)
   os.execv(path, args)
   os.execve(path, args, env)
   os.execvp(file, args)
   os.execvpe(file, args, env)

Execute the program path in a new process.
   os.spawnl(mode, path, ...)
   os.spawnle(mode, path, ..., env)
   os.spawnlp(mode, file, ...)
   os.spawnlpe(mode, file, ..., env)
   os.spawnv(mode, path, args)
   os.spawnve(mode, path, args, env)
   os.spawnvp(mode, file, args)
   os.spawnvpe(mode, file, args, env)

调度器接口

1
2

os.sched_get_priority_min(policy):获取 policy 的最小优先级数值。 policy 是以上调度策略常量之一。
os.sched_get_priority_max(policy):获取 policy 的最高优先级数值。 policy 是以上调度策略常量之一。

杂项系统信息

os.confstr(name):Return string-valued system configuration values.
os.cpu_count():Return the number of CPUs in the system.该数量不同于当前进程可以使用的CPU数量。可用的CPU数量可以由 len(os.sched_getaffinity(0)) 方法获得。
os.curdir:The constant string used by the operating system to refer to the current directory.
os.pardir:The constant string used by the operating system to refer to the parent directory.
os.sep:The character used by the operating system to separate pathname components.

随机数

1 2	os.getrandom(size, flags=0):Get up to size random bytes. os.urandom(size):Return a string of size random bytes suitable for cryptographic use.

sys模块

sys模块介绍

该模块提供了一些变量和函数。这些变量可能被解释器使用，也可能由解释器提供。

sys模块常用方法

sys.argv: 命令行参数List，第一个元素是程序本身路径
sys.exit(n): 退出程序，正常退出时exit(0),错误退出sys.exit(1)
sys.version: 获取Python解释程序的版本信息
sys.path: 返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform: 返回操作系统平台名称

序列化相关模块

序列化概念

wiki中对于序列化的解释：在计算机科学的数据处理中，是指将数据结构或对象状态转换成可取用格式（例如存成文件，存于缓冲，或经由网络中发送），以留待后续在相同或另一台计算机环境中，能恢复原先状态的过程。
简而言之，将原本的字典、列表等内容转换成一个字符串的过程叫做序列化，其反过程叫反序列化。

序列化目的

以某种存储形式使自定义对象持久化
将对象从一个地方传递到另一个地方
使程序更具维护性

序列化相关模块介绍

json: 由 RFC 7159 (which obsoletes RFC 4627) 和 ECMA-404 指定，是一个受 JavaScript 的对象字面量语法启发的轻量级数据交换格式，尽管它不仅仅是一个严格意义上的 JavaScript 的字集。
pickle: 模块 pickle 实现了对一个 Python 对象结构的二进制序列化和反序列化。
shelve: “Shelf” 是一种持久化的类似字典的对象。与 “dbm” 数据库的区别在于 Shelf 中的值（不是键！）实际上可以为任意 Python 对象, 即 pickle 模块能够处理的任何东西。
marshal: 此模块包含一此能以二进制格式来读写 Python 值的函数。这不是一个通用的“持久化”模块。对于通用的持久化以及通过 RPC 调用传递 Python 对象，请参阅 pickle 和 shelve 等模块。

序列化相关模块常用方法

json: 用于字符串(包括其他支持json的语言)和python数据类型间进行转换

# ---------------
# 序列化dumps
# ---------------
import json

# 序列化列表list
print(json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]))
>>> '["foo", {"bar": ["baz", null, 1.0, 2]}]'

# 序列化字符串string
print(json.dumps("\"foo\bar"))
>>> "\"foo\bar"

print(json.dumps('\u1234'))
>>> "\u1234"

print(json.dumps('\\'))
>>> "\\"

# 序列化字典dict
print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
>>> '{"a": 0, "b": 0, "c": 0}'

# ---------------
# 反序列化loads
# ---------------
import json
print(json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]'))
>>> ['foo', {'bar': ['baz', None, 1.0, 2]}]

json.loads('"\\"foo\\bar"')
>>> '"foo\x08ar'

# ---------------
# 序列化dump(文件)
# ---------------
import json
dic = {'k1':'v1','k2':'v2','k3':'v3'}
with open('json_file',mode='wb') as f:
    dic_fd=json.dump(dic,f)
# 文件中的内容：
>>> {"k3": "v3", "k2": "v2", "k1": "v1"}

# ---------------
# 反序列化load(文件)
# ---------------
import json
with open('json_file',mode='r') as f:
    dic_fs=json.load(f)
print(dic_fs)
>>> {'k1': 'v1', 'k3': 'v3', 'k2': 'v2'}

shelve: 只提供一个open方法，用key来访问，使用起来与字典类似

# ---------------
# 序列化(文件)
# ---------------
import shelve
with shelve.open('shelve_file') as f:
    f['key'] = {1,2,3,4,5,6,7,8,9}

# ---------------
# 反序列化(文件)
# ---------------
import shelve
with shelve.open('shelve_file') as f:
    data = f['key']
print(data)
>>> {1, 2, 3, 4, 5, 6, 7, 8, 9}

# ---------------
# 额外支持的两个方法
# ---------------
Shelf.sync(): 如果 Shelf 打开时将 writeback 设为 True 则写回缓存中的所有条目。 如果可行还会清空缓存并将持久化字典同步到磁盘。 此方法会在使用 close() 关闭 Shelf 时自动被调用。
Shelf.close():同步并关闭持久化 dict 对象。 对已关闭 Shelf 的操作将失败并引发 ValueError。

pickle: 用于python特有的类型和python的数据类型间进行转换

# ---------------
# 序列化dumps
# ---------------
import pickle

# 序列化列表list
print(pickle.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]))
>>> b'\x80\x03]q\x00(X\x03\x00\x00\x00fooq\x01}q\x02X\x03\x00\x00\x00barq\x03(X\x03\x00\x00\x00bazq\x04NG?\xf0\x00\x00\x00\x00\x00\x00K\x02tq\x05se.'

# 序列化字符串string
print(pickle.dumps("\"foo\bar"))
>>> b'\x80\x03X\x07\x00\x00\x00"foo\x08arq\x00.'

print(pickle.dumps('\u1234'))
>>> b'\x80\x03X\x03\x00\x00\x00\xe1\x88\xb4q\x00.'

print(pickle.dumps('\\'))
>>> b'\x80\x03X\x01\x00\x00\x00\\q\x00.'

# 序列化字典dict
print(pickle.dumps({"c": 0, "b": 0, "a": 0})
>>> b'\x80\x03}q\x00(X\x01\x00\x00\x00cq\x01K\x00X\x01\x00\x00\x00bq\x02K\x00X\x01\x00\x00\x00aq\x03K\x00u.'

# ---------------
# 反序列化loads
# ---------------
import pickle
print(pickle.loads(b'\x80\x03]q\x00(X\x03\x00\x00\x00fooq\x01}q\x02X\x03\x00\x00\x00barq\x03(X\x03\x00\x00\x00bazq\x04NG?\xf0\x00\x00\x00\x00\x00\x00K\x02tq\x05se.'))
>>> ['foo', {'bar': ['baz', None, 1.0, 2]}]

print(pickle.loads(b'\x80\x03X\x07\x00\x00\x00"foo\x08arq\x00.'))
>>> '"foo\x08ar'

# ---------------
# 序列化dump(文件)
# ---------------
import pickle
dic = {'k1':'v1','k2':'v2','k3':'v3'}
with open('pickle_file',mode='wb') as f:
    dic_fd=pickle.dump(dic,f)
# 文件中的内容：
>>> 8003 7d71 0028 5802 0000 006b 3171 0158
0200 0000 7631 7102 5802 0000 006b 3271
0358 0200 0000 7632 7104 5802 0000 006b
3371 0558 0200 0000 7633 7106 752e 

# ---------------
# 反序列化load(文件)
# ---------------
import pickle
with open('pickle_file',mode='rb') as f:
    dic_fs=pickle.load(f)
print(dic_fs)
>>> {'k1': 'v1', 'k3': 'v3', 'k2': 'v2'}

marshal：Python 有一个更原始的序列化模块称为 marshal，但一般地 pickle 应该是序列化 Python 对象时的首选。marshal 存在主要是为了支持 Python 的 .pyc 文件.

`Pickle` vs `Json` vs `Marshal`

pickle vs json
- pickle 模块会跟踪已被序列化的对象，所以该对象之后再次被引用时不会再次被序列化。marshal 不会这么做。
- 这隐含了递归对象和共享对象。递归对象指包含对自己的引用的对象。这种对象并不会被 marshal 接受，并且实际上尝试 marshal 递归对象会让你的 Python 解释器崩溃。对象共享发生在对象层级中存在多处引用同一对象时。pickle 只会存储这些对象一次，并确保其他的引用指向同一个主副本。共享对象将保持共享，这可能对可变对象非常重要。
- marshal 不能被用于序列化用户定义类及其实例。pickle 能够透明地存储并保存类实例，然而此时类定义必须能够从与被存储时相同的模块被引入。
- 同样用于序列化的 marshal 格式不保证数据能移植到不同的 Python 版本中。因为它的主要任务是支持 .pyc 文件，必要时会以破坏向后兼容的方式更改这种序列化格式，为此 Python 的实现者保留了更改格式的权利。pickle 序列化格式可以在不同版本的 Python 中实现向后兼容，前提是选择了合适的 pickle 协议。如果你的数据要在 Python 2 与 Python 3 之间跨越传递，封存和解封的代码在 2 和 3 之间也是不同的。
pickle vs marshal
- JSON 是一个文本序列化格式（它输出 unicode 文本，尽管在大多数时候它会接着以 utf-8 编码），而 pickle 是一个二进制序列化格式；
- JSON 是我们可以直观阅读的，而 pickle 不是；
- JSON是可互操作的，在Python系统之外广泛使用，而pickle则是Python专用的；
- 默认情况下，JSON 只能表示 Python 内置类型的子集，不能表示自定义的类；但 pickle 可以表示大量的 Python 数据类型（可以合理使用 Python 的对象内省功能自动地表示大多数类型，复杂情况可以通过实现 specific object APIs 来解决）。
- 不像pickle，对一个不信任的JSON进行反序列化的操作本身不会造成任意代码执行漏洞。

hashlib模块

hashlib模块介绍

该模块为许多不同的安全哈希和消息摘要算法实现了一个通用接口。包括FIPS安全哈希算法SHA1，SHA224，SHA256，SHA384和SHA512以及RSA的MD5算法。术语“安全哈希”和“消息摘要”是等价的，旧称消息摘要，现在称安全哈希。

hashlib的作用

它通过一个函数，把任意长度的数据转换为一个长度固定的数据串，用以验证原始数据是否被篡改，保证数据的一致性。

hashlib模块应用场景

摘要算法
密码的密文存储
文件的一致性验证

hashlib模块示例

import hashlib

# 使用md5进行哈希
md5 = hashlib.md5()
md5.update(b'life is short')
print(md5.hexdigest())

>>> 617d2b938b9b59b347b92f19f84436bd

# 使用sha256进行哈希
m = hashlib.sha256()
m.update(b"Nobody inspects")
m.update(b" the spammish repetition")
m.digest()
>>> b'\x03\x1e\xdd}Ae\x15\x93\xc5\xfe\\\x00o\xa5u+7\xfd\xdf\xf7\xbcN\x84:\xa6\xaf\x0c\x95\x0fK\x94\x06'

# 使用sha224进行哈希
hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest()
>>> 'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'

configparser模块

configparser模块介绍

该模块提供了ConfigParser类，该类实现了一种基本配置语言，该语言提供的结构类似于Microsoft Windows INI文件中的结构。您可以使用它来编写可由最终用户轻松定制的Python程序。

configparser模块示例

常见配置文档格式

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no

configparser创建Python配置文档

# 使用configparser实现上述配置文件

import configparser
config = configparser.ConfigParser()
config['DEFAULT'] = {
                        'ServerAliveInterval': '45',
                        'Compression': 'yes',
                        'CompressionLevel': '9'
                    }
config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Port'] = '50022'
topsecret['ForwardX11'] = 'no'
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:
    config.write(configfile)

增删改查CURD操作

# --------
# 增
# --------
import configparser

# 使用set方法增加配置
config = configparser.ConfigParser()
config.add_section('new_added')
config.set('new_added','config_key_1','Value')
config.write(open('sample_add.ini', "w"))

# --------
# 删
# --------
import configparser

# 使用remove_section/remove_option方法删除
config = configparser.ConfigParser()
config.read('sample.ini')
config.remove_section('new_added')
config.remove_option('new_added_2',"config_key_2")
config.write(open('sample_del.ini', "w"))

# --------
# 改
# --------
import configparser

# 使用read读取，set方法进行覆盖修改操作
config = configparser.ConfigParser()
config.read('sample.ini')
config.set('config_section_2','user','python')
config.write(open('sample_modi.ini', "w"))

# --------
# 查
# --------
import configparser

# 类似于字典的方式进行查询操作
config = configparser.ConfigParser()
print(config.sections())

config.read('example.ini')

print(config.sections())
print('bitbucket.org' in config) # True
print(config['bitbucket.org']["user"])

logging模块

logging模块介绍

logging模块为应用与库定义了实现灵活的事件日志系统的函数与类。

logging模块详解

日志级别

级别	数值
CRITICAL	50
ERROR	40
WARNING	30
INFO	20
DEBUG	10
NOTSET	0

Log相关对象

对象类型	说明
Logger	日志，暴露函数给应用程序，基于日志记录器和过滤器级别决定哪些日志有效。
LogRecord	日志记录器，将日志传到相应的处理器处理。
Handler	处理器, 将(日志记录器产生的)日志记录发送至合适的目的地。
Filter	过滤器, 提供了更好的粒度控制,它可以决定输出哪些日志记录。
Formatter	格式化器, 指明了最终输出中日志记录的布局。

basicConfig()方式

import logging

logging.basicConfig(filename="logging.log", filemode="w", format="%(asctime)s %(name)s:%(levelname)s:%(message)s", datefmt="%d-%M-%Y %H:%M:%S", level=logging.DEBUG)
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')

# 输出至logging.log文件中
19-10-18 14:28:57 root:DEBUG:This is a debug message
19-10-18 14:28:57 root:INFO:This is an info message
19-10-18 14:28:57 root:WARNING:This is a warning message
19-10-18 14:28:57 root:ERROR:This is an error message
19-10-18 14:28:57 root:CRITICAL:This is a critical message

logging对象方式：解决了basicconfig方式的中文支持问题和终端、文本同时输出的问题，推荐使用

注意永远不要直接实例化Loggers，应当通过模块级别的函数 logging.getLogger(name)

# 分别定义终端、文件中的日志输出级别，可以设置 Logger 对象为最低级别，之后设置两个不同级别的Handler 对象来实现。

import logging
import logging.handlers

logger = logging.getLogger("logger")

handler_console = logging.StreamHandler()
handler_file = logging.FileHandler(filename="logging.log")

logger.setLevel(logging.DEBUG)  # 级别为30
handler_console.setLevel(logging.WARNING)  # 级别为10
handler_file.setLevel(logging.DEBUG)  # 级别为30

formatter = logging.Formatter("%(asctime)s %(name)s %(levelname)s %(message)s")
handler_console.setFormatter(formatter)
handler_file.setFormatter(formatter)

logger.addHandler(handler_console)
logger.addHandler(handler_file)

# print(handler1.level)  # 10
# print(handler2.level)  # 30
# print(logger.level)  # 30

logger.debug('This is a customer debug message')
logger.info('This is an customer info message')
logger.warning('This is a customer warning message')
logger.error('This is an customer error message')
logger.critical('This is a customer critical message')

# 控制台输出结果
2019-10-18 15:29:54,392 logger WARNING This is a customize warning message
2019-10-18 15:29:54,392 logger ERROR This is an customize error message
2019-10-18 15:29:54,392 logger CRITICAL This is a customize critical message

# 文件输出结果
2019-10-13 15:30:13,417 logger DEBUG This is a customize debug message
2019-10-13 15:30:13,417 logger INFO This is an customize info message
2019-10-13 15:30:13,417 logger WARNING This is a customize warning message
2019-10-13 15:30:13,417 logger ERROR This is an customize error message
2019-10-13 15:30:13,417 logger CRITICAL This is a customize critical message

pymysql模块

pymysql模块介绍

该软件包包含一个基于PEP 249的纯Python MySQL客户端库。

pymysql模块使用示例

版本要求 & 安装

# 版本要求
Python – one of the following:
    CPython >= 2.7 or >= 3.5
    Latest PyPy
MySQL Server – one of the following:
    MySQL >= 5.5
    MariaDB >= 5.5

# 安装
python3 -m pip install PyMySQL

基本使用

# 数据库插入数据
CREATE TABLE `users` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `email` varchar(255) COLLATE utf8_bin NOT NULL,
    `password` varchar(255) COLLATE utf8_bin NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
AUTO_INCREMENT=1 ;

# 使用PyMySQL连接数据库
import pymysql.cursors

# Connect to the database
connection = pymysql.connect(host='localhost',
                             user='user',
                             password='passwd',
                             db='db',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)

try:
    with connection.cursor() as cursor:
        # Create a new record
        sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
        cursor.execute(sql, ('webmaster@python.org', 'very-secret'))

    # connection is not autocommit by default. So you must commit to save
    # your changes.
    connection.commit()

    with connection.cursor() as cursor:
        # Read a single record
        sql = "SELECT `id`, `password` FROM `users` WHERE `email`=%s"
        cursor.execute(sql, ('webmaster@python.org',))
        result = cursor.fetchone()
        print(result)
finally:
    connection.close()

# 输出
{'password': 'very-secret', 'id': 1}

redis模块

redis模块介绍

Redis键值存储的Python接口。

redis模块使用示例

版本要求 & 安装

# 版本要求
redis-py 3.0 supports Python 2.7 and Python 3.5+.

# 安装
pip install redis

基本使用

import redis

r = redis.Redis(host='localhost', port=6379, db=0)
r.set('foo', 'bar')
>>> True

r.get('foo')
>>> 'bar'

PyMongo模块

PyMongo模块介绍

PyMongo是一个Python发行版，其中包含用于MongoDB的工具，并且是从Python使用MongoDB的推荐方式。

PyMongo模块使用示例

版本要求 & 安装

# 版本要求
PyMongo supports MongoDB 2.6, 3.0, 3.2, 3.4, 3.6, 4.0 and 4.2.
PyMongo supports CPython 2.7, 3.4+, PyPy, and PyPy3.5+.

# 安装
python -m pip install pymongo[snappy,gssapi,srv,tls,zstd]

基本使用

import pymongo

client = pymongo.MongoClient("localhost", 27017)
db = client.test
db.name
>>> u'test'


db.my_collection
>>> Collection(Database(MongoClient('localhost', 27017), u'test'), u'my_collection')


db.my_collection.insert_one({"x": 10}).inserted_id
>>> ObjectId('4aba15ebe23f6b53b0000000')


db.my_collection.insert_one({"x": 8}).inserted_id
>>> ObjectId('4aba160ee23f6b543e000000')


db.my_collection.insert_one({"x": 11}).inserted_id
>>> ObjectId('4aba160ee23f6b543e000002')


db.my_collection.find_one()
>>> {u'x': 10, u'_id': ObjectId('4aba15ebe23f6b53b0000000')}


for item in db.my_collection.find():
    print(item["x"])
>>> 10
>>> 8
>>> 11


db.my_collection.create_index("x")
>>> u'x_1'


for item in db.my_collection.find().sort("x", pymongo.ASCENDING):
    print(item["x"])
>>> 8
>>> 10
>>> 11


[item["x"] for item in db.my_collection.find().limit(2).skip(1)]
>>> [8, 11]

本文链接： https://elijahyg.github.io/2020/05/10/Python常用的模块&包/

版权声明： 本作品采用CC BY-NC-SA 4.0 国际许可协议进行许可。