卫哲:创业公司发展该如何提高效率?

背景

看到一篇卫哲的演讲稿,很有启发,转载学习一下。

原文

我每年会出席各种各样的场合,但从来没有像“效率”这个题目这么令我兴奋的。为什么?

做投资五年多来,收到无数的BP计划(商业计划书),永远是讲“我规模有多大”、“我今天有多大”、“我未来会做得有多大”,还有“我有多快”。

但从来没有没有人在BP当中说:我效率有多高——我今天效率有多高;未来随着我多快、多大以后,我的效率有多高。

互联网最大的作用就是提升效率。**一个互联网公司没有人均10万美元的利润贡献,就不是真正的互联网公司。**大和快的背后,是效率。

不要被互联网这层外衣给迷惑了。商业的本质,除了增长以外还有效率。没有效率的增长,不是慢性自杀,而是加速自杀。

如何提高效率,我们今天会分五个环节来讲。

More...

Https网站不能访问问题排查

背景

我有一个https的网站,最近折腾了一下,发现不能访问了,用各种办法测试,逐步找到问题,这里记录一下。

检查证书是否正常

首先怀疑证书,好久不检查,不知道是否过期,用下面的语句检查一下:

1
openssl x509 -noout -text -in xxx.com.crt

可以看到以下结果:

1
2
3
4
5
6
7
8
9
10
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
xxxxxxx
Signature Algorithm: sha384WithRSAEncryption
Issuer: C = CN, O = "TrustAsia Technologies, Inc.", CN = TrustAsia RSA DV TLS CA G2
Validity
Not Before: Jan 29 00:00:00 2024 GMT
Not After : Jan 28 23:59:59 2025 GMT

证书没问题,25年才过期

检查证书是否可以被访问

用以下语句:

1
openssl s_client -connect xxx.com:443 -showcerts

这时报错了:

1
4087E3AE1B7F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:354:

查到这基本可以确认是网络问题,仔细排查了一下,原来是frp的配置忘改为https协议,导致网络转发不通。改完重启frp,问题解决。

查看SSL详细信息

问题解决后,可以再用 [https://www.ssllabs.com/ssltest/](https://www.ssllabs.com/ssltest/) 这个网站测试一下,结果非常丰富。

查看Linux用户登录日志

1. lastlog

使用lastlog可以查看所有用户最近登录的信息:

1
2
3
4
5
6
(base) root@VM-16-4-debian:~# lastlog
Username Port From Latest
root pts/0 10.10.75.10 Sun Mar 24 17:13:34 +0800 2024
daemon **Never logged in**
bin **Never logged in**
sys **Never logged in**

2. last

列出当前和曾经登入系统的用户信息

1
2
3
4
5
(base) root@VM-16-4-debian:~# last
root pts/0 10.10.75.10 Sun Mar 24 17:13 still logged in
root pts/0 10.10.75.10 Fri Mar 22 22:31 - 01:21 (02:49)
root pts/0 10.10.75.10 Fri Mar 22 20:07 - 22:31 (02:24)
root pts/1 10.10.75.10 Fri Mar 22 18:28 - 20:53 (02:25)

3. lastb

列出失败尝试的登录信息

1
2
3
4
5
6
7
(base) root@VM-16-4-debian:~# lastb -50
root ssh:notty 106.55.187.66 Sun Mar 24 17:28 - 17:28 (00:00)
root ssh:notty 183.15.207.112 Sun Mar 24 17:28 - 17:28 (00:00)
root ssh:notty 101.43.137.100 Sun Mar 24 17:28 - 17:28 (00:00)
root ssh:notty 43.156.133.218 Sun Mar 24 17:28 - 17:28 (00:00)
test1 ssh:notty 170.64.190.204 Sun Mar 24 17:28 - 17:28 (00:00)
test1 ssh:notty 170.64.190.204 Sun Mar 24 17:28 - 17:28 (00:00)

Data Warehouse Construction Standards - Data Modeling Standards

3.1 Data Modeling Standards

a. Horizontal Layering:

  • Explanation

    Our layered design approach is an essential byproduct of our data architecture strategy, adhering to strict standards during the modeling phase for optimal results.

  • Layer Standards

    • The Operational Data Store (ODS)
      • Design Methodology:
        • Store raw data with minimal processing in the data warehouse system, maintaining a structure that is similar to the source system.
      • Main Functions:
        • Synchronize and store foundational data in the data warehouse to address data silo issues and ensure the integrity of data integration .
        • To ensure data persistence, keeping tables and data in perfect sync.
        • Perform regular synchronization and add synchronization timestamps to the tables to capture their temporal variability.
      • Data Processing:
        • Handling of anomalies and erroneous data.
      • Naming Convention:
        • Layer abbreviation_Source System_Source System Table Name
        • Example: ods_oms_order
More...

Data Warehouse Construction Standards - What

3.What are the key components of data warehouse standards?

a. Data Modeling Standards:

These standards define the guidelines for designing and structuring the data models used in the data warehouse. They include naming conventions, entity-relationship diagrams, data type definitions, and relationships between tables.

b. Data Integration Standards:

These standards focus on the processes and methods used to extract, transform, and load (ETL) data into the data warehouse. They cover data extraction techniques, data cleansing procedures, transformation rules, and data loading strategies.

c. Data Quality Standards:

These standards ensure the accuracy, consistency, completeness, and validity of data within the data warehouse. They include data profiling, data validation rules, data cleansing methodologies, and data quality metrics.

d. Metadata Standards:

Metadata standards define the structure and format of metadata stored in the data warehouse. They cover metadata definitions, metadata repositories, metadata integration, and metadata management processes.

e. Security and Access Control Standards:

These standards focus on protecting the data warehouse from unauthorized access, ensuring data privacy, and enforcing data security policies. They include user authentication, authorization mechanisms, encryption techniques, and data masking methods.

f. Performance and Scalability Standards:

These standards address the performance optimization and scalability aspects of the data warehouse. They cover query optimization techniques, indexing strategies, partitioning schemes, and data archiving processes.

g. Data Governance Standards:

Data governance standards define the overall framework and processes for managing and governing data within the data warehouse. They include data stewardship, data ownership, data lifecycle management, and data governance policies.

h. Documentation Standards:

These standards ensure the documentation of all aspects of the data warehouse, including data models, ETL processes, data dictionaries, and data lineage. They promote understanding, maintainability, and ease of future enhancements.

Data Warehouse Construction Standards - How

2.How To Implement Data Warehouse Standards

a. Standard formulation

This involves defining and documenting the specific rules, guidelines, and best practices that will govern the data warehouse. It is essential to involve stakeholders from different teams and departments to ensure comprehensive coverage and alignment with business requirements.

b. Standard discussion

Once the standards are formulated, it is important to engage in discussions with relevant teams and individuals. This can include conducting workshops, training sessions, or meetings to explain the standards, address any questions or concerns, and gather feedback.

c. Standard implementation

After the standards have been discussed and finalized, they need to be effectively communicated and implemented across the organization. This can involve creating documentation, providing training, and establishing processes and tools to support adherence to the standards.

d. Standard enforcement and supervision

It is crucial to have mechanisms in place to monitor and enforce compliance with the data warehouse standards. This can include regular audits, performance reviews, and ongoing communication and support to ensure that the standards are being followed consistently.

e. Standard refinement

Data warehouse standards should be treated as living documents that are continuously reviewed and refined based on feedback, evolving business needs, and technological advancements. Regular evaluations and updates should be conducted to ensure that the standards remain relevant and effective.

Data Warehouse Construction Standards - Why

1. Why We Need Data Warehouse Standards

The adage “No rules, no success” underscores the critical importance of established standards in ensuring optimal team performance and high-quality deliverables. In the absence of such standards, operational efficiency and collaboration may suffer, leading to potentially chaotic outcomes.

Have you encountered similar issues in your work?

  • Received a requirement and not sure which table to pull data from. Table A seems feasible, while table B also appears to work. Asked colleague A, and they said they always pull from table C. Spent a long time exploring these three tables but couldn’t match them up. Oh well, let me calculate from the source again, and then a new table D appeared.
  • I’ve noticed that there are thousands of tables in our database, but I only use a handful. So, what’s the point of all these other ones? I asked my colleagues, but nobody seems to know. Should I just get rid of them? Nobody else is touching them anyway.
  • I got tasked with investigating an error in our process after my boss asked me to take a look. But man, the code is a total mess! I can’t make heads or tails of it. Plus, I’ve been searching for what feels like forever but still can’t find the upstream dependencies. What a headache!
  • My coworker bailed on our project, and now I’m stuck with their share of the work. I’ve been grinding away for weeks, but I just can’t seem to wrap my head around it. It’s like they dumped a ton of unfinished business on me, and I’m feeling pretty frustrated myself now. Maybe it’s time for me to look for a new gig too…

Our data warehouse team’s performance has taken a hit due to all the issues we’ve faced lately. Efficiency, output quality, job satisfaction – you name it. And let me tell you, it’s usually the hardest-working and most loyal employees who bear the brunt of all these problems. It’s just not right.

If you’ve ever worked in data development, you know the pain I’m talking about. I mean, who hasn’t experienced some of these frustrations, right? So, what’s going on here? In my humble opinion, it all boils down to a lack of standards or proper implementation. And hey, I get it – sometimes business demands are tight, and shortcuts gotta be taken. But, that technical debt better be paid off pronto. Blaming employees for that ain’t cool. Leadership needs to own up to it.

Think of a data warehouse like a digital construction project – it’s the intangible result of our data engineers’ hard work. Data standards are like the blueprints for building this system, serving as both the instruction manual and translator for data usage. And just like how you need quality control in construction, we gotta ensure data quality too. But here’s the thing – for our data system to really thrive, we need to move away from relying on individual judgment and toward standardized, tool-driven management. That way, we can scale sustainably and keep the system healthy.

在Docker环境下部署 Kibana 可视化工具 (从零到一搭建ELK日志)

什么是 Kibana?

Kibana是一个开源的分析与可视化平台,设计出来用于和Elasticsearch一起使用的。

你可以用kibana搜索、查看存放在Elasticsearch中的数据。Kibana与Elasticsearch的交互方式是各种不同的图表、表格、地图等,直观的展示数据,从而达到高级的数据分析与可视化的目的。

部署 Kibana 可视化工具

  1. 获取 kibana 镜像

    1
    sudo docker pull kibana:7.7.1
  2. 获取elasticsearch容器 ip

    1
    2
    sudo docker inspect --format '{{ .NetworkSettings.IPAddress }}' es
    > 172.17.0.2
  3. 创建 kibana 配置文件

    1
    sudo mkdir -p /data/elk/kibana/
  4. 配置kibana.yml

    elasticsearch.hosts 配置 ES 服务的地址

    1
    2
    3
    4
    server.name: kibana
    server.host: "0"
    elasticsearch.hosts: ["http://172.17.0.2:9200"]
    xpack.monitoring.ui.container.elasticsearch.enabled: true
  5. 启动服务

    1
    sudo docker run -d --restart=always --log-driver json-file --log-opt max-size=100m --log-opt max-file=2 --name kibana -p 5601:5601 -v /data/elk/kibana/kibana.yml:/usr/share/kibana/config/kibana.yml kibana:7.7.1
  6. 验证

    在浏览器打开 [服务器ip]:5601

    Kibana 控制台的界面如下所示,打开 kibana 时,首页会提示让你选择加入一些测试数据,点击 try our sample data 按钮就可以了。最后选择Dashboard,可以看到 sample data 的图表:

配置文件kibana.yml

  1. 服务的端口配置

    属性名为:server.port默认是5601

  2. 允许远程访问的地址配置

    属性名为:server.host默认为本机

    如果我们需要把Kibana服务给远程主机访问,只需要在这个配置中填写远程的那台主机的ip地址,那如果我们希望所有的远程主机都能访问,那就填写0.0.0.0

  3. 连接Elasticsearch服务配置

    属性名为:elasticsearch.url默认为连接到本机的elasticsearch,并且端口为9200,也就是为localhost:9200,如果我们Elasticsearch不是与Kibana安装在同一台主机上,或者Elasticsearch的端口号不是9200,就就需要修改这个配置了

  4. Elasticsearch的用户名和密码

    属性名为:elasticsearch.username和elasticsearch.password,默认是没有用户名和密码,如果elasticsearch是配置了用户名和密码的,那就需要配置这两行属性

  5. 切换中文配置

    属性名为:i18n.locale: “zh-CN”

请我喝杯咖啡吧~

支付宝
微信