Using BertServer¶

Installation
Command Line Interface
Server-side API
Server-side Benchmark

Installation ¶

The best way to install the server is via pip. Note that the server can be installed separately from BertClient or even on a different machine:

pip install bert-serving-server

Warning

The server MUST be running on Python >= 3.5 with Tensorflow >= 1.10 (one-point-ten). Again, the server does not support Python 2!

Command Line Interface ¶

Once installed, you can use the command line interface to start a bert server:

bert-serving-server -model_dir /uncased_bert_model -num_worker 4

Server-side API ¶

Server-side is a CLI bert-serving-start, you can get the latest usage via:

bert-serving-start --help

Start a BertServer for serving

usage: bert-serving-server [-h] -model_dir MODEL_DIR
                           [-tuned_model_dir TUNED_MODEL_DIR]
                           [-ckpt_name CKPT_NAME] [-config_name CONFIG_NAME]
                           [-graph_tmp_dir GRAPH_TMP_DIR]
                           [-max_seq_len MAX_SEQ_LEN] [-cased_tokenization]
                           [-pooling_layer POOLING_LAYER [POOLING_LAYER ...]]
                           [-pooling_strategy {NONE,REDUCE_MAX,REDUCE_MEAN,REDUCE_MEAN_MAX,FIRST_TOKEN,LAST_TOKEN,CLS_POOLED,CLASSIFICATION,REGRESSION}]
                           [-mask_cls_sep] [-no_special_token]
                           [-show_tokens_to_client] [-no_position_embeddings]
                           [-num_labels NUM_LABELS] [-port PORT]
                           [-port_out PORT_OUT] [-http_port HTTP_PORT]
                           [-http_max_connect HTTP_MAX_CONNECT] [-cors CORS]
                           [-num_worker NUM_WORKER]
                           [-max_batch_size MAX_BATCH_SIZE]
                           [-priority_batch_size PRIORITY_BATCH_SIZE] [-cpu]
                           [-xla] [-fp16]
                           [-gpu_memory_fraction GPU_MEMORY_FRACTION]
                           [-device_map DEVICE_MAP [DEVICE_MAP ...]]
                           [-prefetch_size PREFETCH_SIZE]
                           [-fixed_embed_length] [-verbose] [-version]

Named Arguments ¶

-verbose

turn on tensorflow logging for debug

Default: False

-version

show program’s version number and exit

File Paths ¶

config the path, checkpoint and filename of a pretrained/fine-tuned BERT model

`-model_dir`	directory of a pretrained BERT model
`-tuned_model_dir`
	directory of a fine-tuned BERT model
`-ckpt_name`	filename of the checkpoint file. By default it is “bert_model.ckpt”, but for a fine-tuned model the name could be different. Default: “bert_model.ckpt”
`-config_name`	filename of the JSON config file for BERT model. Default: “bert_config.json”
`-graph_tmp_dir`	path to graph temp file

BERT Parameters ¶

config how BERT model and pooling works

`-max_seq_len`	maximum length of a sequence, longer sequence will be trimmed on the right side. set it to NONE for dynamically using the longest sequence in a (mini)batch. Default: 25
`-cased_tokenization`
	Whether tokenizer should skip the default lowercasing and accent removal.Should be used for e.g. the multilingual cased pretrained BERT model. Default: True
`-pooling_layer`	the encoder layer(s) that receives pooling. Give a list in order to concatenate several layers into one Default: [-2]
`-pooling_strategy`
	Possible choices: NONE, REDUCE_MAX, REDUCE_MEAN, REDUCE_MEAN_MAX, FIRST_TOKEN, LAST_TOKEN, CLS_POOLED, CLASSIFICATION, REGRESSION the pooling strategy for generating encoding vectors Default: REDUCE_MEAN
`-mask_cls_sep`	masking the embedding on [CLS] and [SEP] with zero. When pooling_strategy is in {CLS_TOKEN, FIRST_TOKEN, SEP_TOKEN, LAST_TOKEN} then the embedding is preserved, otherwise the embedding is masked to zero before pooling Default: False
`-no_special_token`
	add [CLS] and [SEP] in every sequence, put sequence to the model without [CLS] and [SEP] when True and is_tokenized=True in Client Default: False
`-show_tokens_to_client`
	sending tokenization results to client Default: False
`-no_position_embeddings`
	Whether to add position embeddings for the position of each token in the sequence. Default: False
`-num_labels`	Numbers of Label Default: 2

Serving Configs ¶

config how server utilizes GPU/CPU resources

`-port, -port_in, -port_data`
	server port for receiving data from client Default: 5555
`-port_out, -port_result`
	server port for sending result to client Default: 5556
`-http_port`	server port for receiving HTTP requests
`-http_max_connect`
	maximum number of concurrent HTTP connections Default: 10
`-cors`	setting “Access-Control-Allow-Origin” for HTTP requests Default: “*”
`-num_worker`	number of server instances Default: 1
`-max_batch_size`
	maximum number of sequences handled by each worker Default: 256
`-priority_batch_size`
	batch smaller than this size will be labeled as high priority,and jumps forward in the job queue Default: 16
`-cpu`	running on CPU (default on GPU) Default: False
`-xla`	enable XLA compiler (experimental) Default: False
`-fp16`	use float16 precision (experimental) Default: False
`-gpu_memory_fraction`
	determine the fraction of the overall amount of memory that each visible GPU should be allocated per worker. Should be in range [0.0, 1.0] Default: 0.5
`-device_map`	specify the list of GPU device ids that will be used (id starts from 0). If num_worker > len(device_map), then device will be reused; if num_worker < len(device_map), then device_map[:num_worker] will be used Default: []
`-prefetch_size`	the number of batches to prefetch on each worker. When running on a CPU-only machine, this is set to 0 for comparability Default: 10
`-fixed_embed_length`
	when “max_seq_len” is set to None, the server determines the “max_seq_len” according to the actual sequence lengths within each batch. When “pooling_strategy=NONE”, this may cause two “.encode()” from the same client results in different sizes [B, T, D].Turn this on to fix the “T” in [B, T, D] to “max_position_embeddings” in bert json config. Default: False

Server-side Benchmark ¶

If you want to benchmark the speed, you may use:

bert-serving-benchmark --help

Benchmark BertServer locally

usage: bert-serving-benchmark [-h] -model_dir MODEL_DIR
                              [-tuned_model_dir TUNED_MODEL_DIR]
                              [-ckpt_name CKPT_NAME]
                              [-config_name CONFIG_NAME]
                              [-graph_tmp_dir GRAPH_TMP_DIR]
                              [-max_seq_len MAX_SEQ_LEN] [-cased_tokenization]
                              [-pooling_layer POOLING_LAYER [POOLING_LAYER ...]]
                              [-pooling_strategy {NONE,REDUCE_MAX,REDUCE_MEAN,REDUCE_MEAN_MAX,FIRST_TOKEN,LAST_TOKEN,CLS_POOLED,CLASSIFICATION,REGRESSION}]
                              [-mask_cls_sep] [-no_special_token]
                              [-show_tokens_to_client]
                              [-no_position_embeddings]
                              [-num_labels NUM_LABELS] [-port PORT]
                              [-port_out PORT_OUT] [-http_port HTTP_PORT]
                              [-http_max_connect HTTP_MAX_CONNECT]
                              [-cors CORS] [-num_worker NUM_WORKER]
                              [-max_batch_size MAX_BATCH_SIZE]
                              [-priority_batch_size PRIORITY_BATCH_SIZE]
                              [-cpu] [-xla] [-fp16]
                              [-gpu_memory_fraction GPU_MEMORY_FRACTION]
                              [-device_map DEVICE_MAP [DEVICE_MAP ...]]
                              [-prefetch_size PREFETCH_SIZE]
                              [-fixed_embed_length] [-verbose] [-version]
                              [-test_client_batch_size [TEST_CLIENT_BATCH_SIZE [TEST_CLIENT_BATCH_SIZE ...]]]
                              [-test_max_batch_size [TEST_MAX_BATCH_SIZE [TEST_MAX_BATCH_SIZE ...]]]
                              [-test_max_seq_len [TEST_MAX_SEQ_LEN [TEST_MAX_SEQ_LEN ...]]]
                              [-test_num_client [TEST_NUM_CLIENT [TEST_NUM_CLIENT ...]]]
                              [-test_pooling_layer [TEST_POOLING_LAYER [TEST_POOLING_LAYER ...]]]
                              [-wait_till_ready WAIT_TILL_READY]
                              [-client_vocab_file CLIENT_VOCAB_FILE]
                              [-num_repeat NUM_REPEAT]