Elasticsearch 权威指南
  • Elasticsearch 权威指南
  • 入门
    • 初识
    • 安装
    • API
    • 文档
    • 索引
    • 搜索
    • 汇总
    • 小结
    • 分布式
    • 本章总结
  • 分布式集群
    • 空集群
    • 集群健康
    • 添加索引
    • 容错移转
    • 横向扩展
    • 扩展
    • 故障恢复
  • 数据
    • 文档
    • 索引
    • Get
    • 存在
    • 更新
    • 创建
    • 删除
    • 版本控制
    • 局部更新
    • Mget
    • Bulk
    • 总结
  • 分布式文档存储
    • 路由
    • 主从互通
    • 创建索引删除
    • 获取
    • 局部更新
    • 批量请求
    • 批量格式
  • 搜索
    • 空白搜索
    • 多索引多类型
    • 分页
    • 查询语句
  • 映射与统计
    • Exact_vs_full_text
    • Inverted_index
    • Analysis
    • Mapping
    • Complex_datatypes
Powered by GitBook
On this page

Was this helpful?

  1. 分布式文档存储

批量格式

When we learned about Bulk requests earlier in <>, you may have asked yourself: `Why does thebulkAPI require the funny format with the newline characters, instead of just sending the requests wrapped in a JSON array, like themget` API?''

To answer this, we need to explain a little background:

Each document referenced in a bulk request may belong to a different primary shard, each of which may be allocated to any of the nodes in the cluster. This means that every action inside a bulk request needs to be forwarded to the correct shard on the correct node.

If the individual requests were wrapped up in a JSON array, that would mean that we would need to:

  • parse the JSON into an array (including the document data, which

    can be very large)

  • look at each request to determine which shard it should go to

  • create an array of requests for each shard

  • serialize these arrays into the internal transport format

  • send the requests to each shard

It would work, but would need a lot of RAM to hold copies of essentially the same data, and would create many more data structures that the JVM would have to spend time garbage collecting.

Instead, Elasticsearch reaches up into the networking buffer, where the raw request has been received and reads the data directly. It uses the newline characters to identify and parse just the small action/metadata lines in order to decide which shard should handle each request.

These raw requests are forwarded directly to the correct shard. There is no redundant copying of data, no wasted data structures. The entire request process is handled in the smallest amount of memory possible.

Previous批量请求Next搜索

Last updated 5 years ago

Was this helpful?