01-evolution_IO

  • Redis 远程数据字典服务器(REmote DIctionary Server)

1. Concept

1. 磁盘

  1. 寻址:ms
  2. 带宽:G/M

2. 内存

  1. 寻址:ns
  2. 带宽:很大
  3. 内存是线性地址空间

秒 > 毫秒 > 微秒 > 纳秒。磁盘比内存在寻址上慢了10W倍

3. I/O buffer

  1. 硬盘是磁道、扇区。一扇区 <512Byte> ,索引成本变大
  2. 4K对齐。操作系统,无论你读多少,都是最少4k从磁盘拿。索引体量变小

2. Evolution

1. DB + index

关系型数据库

  1. 建表:必须给出schema
  2. 类型:字节宽度
  3. 存:倾向于行级存储(即使字段为空,也预留出空间)

数据库,表很大,性能下降?(如果表有索引)

  1. 增、删、改,变慢
  2. 查询速度呢?
    1. 1个或少量查询依然很快
    2. 并发大受硬盘带宽影响速度(多个date_page都要同时写入内存)

2. 缓存DB

  1. SAP,HANA ,内存级别的关系型数据库,2T。太贵
  2. 数据在磁盘和内存体积不一样。在内存中的数据比在硬盘中的体积小

3. DB + cache

2个基础设施

  1. 冯诺依曼体系的硬件
  2. 以太网,tcp/ip的网络(不稳定,整合多个技术,一定带来数据一致性、双写等问题)

左边,太慢。右边太贵。出现了折中方案缓存memcached、redis

F7401753-09F2-4E26-81E6-F785906C4E26

3. DB-ENGINES

架构师

  1. 技术选型
  2. 技术对比

1. DB-Engines Ranking

83BB8E19-2905-4198-B3D5-C1237E0DA83B

2. Mysql Redis Systems

30BC27AC-F35E-4EF7-9531-EEDAA54C9F11
  • 单机:10W ops
  • socket IO:6、7w ops
C481180D-9B81-48AC-9019-1717AB35A73F

4. Redis

程序 = 算法 + 数据结构

  1. Redis是一个开源(BSD许可)的,内存中的数据结构存储系统,它可以用作数据库、缓存、消息中间件
  2. 支持多种类型的数据结构
    1. 字符串(string)
      1. 字符
      2. 数值
      3. bitmaps
    2. 散列(hash)
    3. 列表(list)
    4. 集合(set)
    5. 有序集合(sorted_set)
      • 与范围查询, bitmaps, hyperloglogs 和 地理空间(geospatial) 索引半径查询
  3. Redis内置了复制(replication)、LUA脚本(Lua_scripting)、LRU驱动事件(LRU_eviction)、事务(transactions)和不同级别的磁盘持久化(persistence),并通过Redis哨兵(Sentinel)和自动分区(Cluster)提供高可用性(high_availability)

1. 对比memcached

  • memcached,value没有类型概念。client获取数据后要进行解析
  • redis计算向数据移动,直接获取数据,不需要解析
8443D558-385A-447D-8D40-2D158DB456E4

2. 安装Redis

3. 日志

  1. redis.conf中,将logfile ""
    • logfile默认为空字符串。如果是空字符串,则使用标准输出;如果是空字符串且是后台运行,则日志被发送到/dev/null
    • 在类Unix系统中,/dev/null称为空设备或黑洞,是一个特殊的设备文件,它丢弃一切写入其中的数据,但报告写入操作成功
  2. 日志级别
    • redis.conf中配置。loglevel warning即可。日志级别从低到高分别为:debug, verbose, notice, warning

5. IO浅谈

同步,非阻塞的,多路复用

  1. 单进程,单线程,单实例。线程安全。顺序性,每个socket连接内命令顺序
  2. 并发很多请求,如何变得很快?
17FC14FE-C4F5-4808-AE02-7BADC75457EA
  • nginx同步、非阻塞多路复用
6B7C7E0D-02F8-47A3-9CDB-222057162EFA

1. BIO

Block IO。同步阻塞时期。service端对数据的处理,比client端快的太多了,阻塞时间很久

  1. 1个连接就是一个fd(文件描述符)
    • JVM一个线程内存成本1MB。线程多了增加调度成本
  2. socket这个时期是blocking。read()不会释放fd,阻塞状态
BB0CB192-DF3B-4781-BBB4-B7F9C938AA4F
# linux下man帮助程序,man-pages帮助页
yum install man man-pages

# man帮助程序有8类文档。2类为系统调用,内核给程序暴露的调用方法
[root@hecs-168322 ~]# man ls
---------------------------------------------------------------------
LS(1)              User Commands               LS(1)

NAME
       ls - list directory contents # man帮助程序有8类文档

SYNOPSIS
       ls [OPTION]... [FILE]...
       # ...

 












[root@hecs-168322 ~]# man 2 read
---------------------------------------------------------------------
READ(2q)        Linux Programmer's Manual        READ(2)

NAME
       read - read from a file descriptor # linux一切皆文件,fd文件描述符
       # ...
 






[root@hecs-168322 ~]# ps -ef | grep redis
root     32668 28127  0 10:58 pts/0    00:00:00 ./redis-server *:6379
root     32693 32673  0 10:59 pts/1    00:00:00 grep --color=auto redis

# 1. 任何进程都有其fd文件
[root@hecs-168322 ~]# cd /proc/32668/fd
[root@hecs-168322 fd]# ll
total 0
lrwx------ 1 root root 64 Apr  9 11:00 0 -> /dev/pts/0         # 0:标准输入
lrwx------ 1 root root 64 Apr  9 11:00 1 -> /dev/pts/0         # 1:标准输出
lrwx------ 1 root root 64 Apr  9 10:59 2 -> /dev/pts/0         # 2:错误输出
lr-x------ 1 root root 64 Apr  9 11:00 3 -> pipe:[189200]
l-wx------ 1 root root 64 Apr  9 11:00 4 -> pipe:[189200]
lrwx------ 1 root root 64 Apr  9 11:00 5 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Apr  9 11:00 6 -> socket:[189203]
lrwx------ 1 root root 64 Apr  9 11:00 7 -> socket:[189204]
[root@hecs-168322 fd]# cd /proc/$$/fd
[root@hecs-168322 fd]# ll
total 0
lrwx------ 1 root root 64 Apr  9 10:59 0 -> /dev/pts/1
lrwx------ 1 root root 64 Apr  9 10:59 1 -> /dev/pts/1
lrwx------ 1 root root 64 Apr  9 10:59 2 -> /dev/pts/1
lrwx------ 1 root root 64 Apr  9 11:00 255 -> /dev/pts/1
 




 


 
 
 












2. NIO

  • Nonblock IO。同步非阻塞时期
man 2 socket
---------------------------------------------------------------------
SOCKET(2)                   Linux Programmer's Manual              SOCKET(2)

NAME
       socket - create an endpoint for communication

DESCRIPTION
       socket() creates an endpoint for communication and returns a descriptor.

       Since Linux 2.6.27, the type argument serves a second purpose: in addition to specifying a  socket  type,
       it may include the bitwise OR of any of the following values, to modify the behavior of socket():

	   # 1. 系统调用socket()方法,返回一个fd。type:SOCK_NONBLOCK => 非阻塞类型
       SOCK_NONBLOCK   Set  the  O_NONBLOCK  file status flag on the new open file description.  Using this flag
                       saves extra calls to fcntl(2) to achieve the same result.

       SOCK_CLOEXEC    Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.  See the  description
                       of the O_CLOEXEC flag in open(2) for reasons why this may be useful.














 




  1. 非阻塞的,用一个线程/进程,循环(read fd)即可。轮询发生在用户空间。同步非阻塞时期
  2. 循环系统调用问题
5D55816D-9FA1-48A4-BF10-A201CF13AA9F

3. 多路复用select_NIO

  1. kernel发展,增加select()系统调用。批量处理fd,将有数据的fd传递给read()系统调用
  2. 新问题:用户态、内核态fd,频繁copy
man 2 select
---------------------------------------------------------------------
SELECT(2)                       Linux Programmer's Manual                         SELECT(2)

NAME
       select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - synchronous I/O multiplexing

SYNOPSIS
       /* According to POSIX.1-2001 */
       #include <sys/select.h>

       /* According to earlier standards */
       #include <sys/time.h>
       #include <sys/types.h>
       #include <unistd.h>

       int select(int nfds, fd_set *readfds, fd_set *writefds,
                  fd_set *exceptfds, struct timeval *timeout);
 

















image-20231110112347060

4. 多路复用epoll_NIO

  1. epoll_create()创建epoll_fd。epoll准备共享空间mmap,即红黑树
  2. epoll_ctl()向红黑树中增、删fd
  3. epoll_wait()等待
  4. kernel判断哪些可read(),放到链表中
image-20231110112633446
man epoll # 7类型杂项,包含3个系统调用
---------------------------------------------------------------------
EPOLL(7)                                Linux Programmer's Manual                                EPOLL(7)

NAME
       epoll - I/O event notification facility

SYNOPSIS
       #include <sys/epoll.h>

DESCRIPTION
       The  epoll  API performs a similar task to poll(2): monitoring multiple file descriptors to see if
       I/O is possible on any of them.  The epoll API can be used either as an edge-triggered or a level-
       triggered  interface  and scales well to large numbers of watched file descriptors.  The following
       system calls are provided to create and manage an epoll instance:
			 # 1. epoll_create(2)
       *  epoll_create(2) creates an epoll instance and returns  a  file  descriptor  referring  to  that
          instance.  (The more recent epoll_create1(2) extends the functionality of epoll_create(2).)
			 # 2. epoll_ctl(2)
       *  Interest  in  particular file descriptors is then registered via epoll_ctl(2).  The set of file
          descriptors currently registered on an epoll instance is sometimes called an epoll set.
			 # 3. epoll_wait(2)
       *  epoll_wait(2) waits for I/O events, blocking the calling thread  if  no  events  are  currently
          available.
 















 


 


 

1. 共享空间mmap

  • 用户态、内核态共享的空间。减少频繁copy
man 2 mmap
---------------------------------------------------------------------
MMAP(2)                     Linux Programmer's Manual                          MMAP(2)

NAME
       mmap, munmap - map or unmap files or devices into memory
 





6. 0_Copy

# sendFile()系统调用。输入、输出fd
man 2 sendfile
---------------------------------------------------------------------
SENDFILE(2)                   Linux Programmer's Manual                     SENDFILE(2)

NAME
       sendfile - transfer data between file descriptors

SYNOPSIS
       #include <sys/sendfile.h>

       ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

 










  1. 网卡到kernel为socket_IO
  2. file.txt到kernel为file_IO
  3. file.txt先拷贝到buffer缓冲区,经过内核态内存,用户态内存
  4. sendfile()系统调用就不拷来拷去了,直接发出
AA0D63C3-23D2-4B49-A92A-CB9F620C7AFE

1. kafka基于0_Copy

  1. kafka是基于JVM,用户态应用
  2. mmap减少系统调用,减少数据copy
BEB5AD3C-D88D-486D-B9FE-A3B23415B91D