Package com.knuddels.jtokkit.api
Class GptBytePairEncodingParams
java.lang.Object
com.knuddels.jtokkit.api.GptBytePairEncodingParams
Parameter for the byte pair encoding used to tokenize for the OpenAI GPT models.
This library supports the encodings that are listed in EncodingType out of the box.
But if you want to use a custom encoding, you can use this class to pass the parameters to the library.
Use EncodingRegistry.registerGptBytePairEncoding(GptBytePairEncodingParams) to register your custom encoding
to the registry, so that you can easily use your encoding in conjunction with the predefined ones.
The encoding parameters are:
- name: The name of the encoding. This is used to identify the encoding and must be unique.
- pattern: The pattern that is used to split the input text into tokens.
- encoder: The encoder that maps the tokens to their ids.
- specialTokensEncoder: The encoder that maps the special tokens to their ids.
-
Constructor Summary
ConstructorsConstructorDescriptionGptBytePairEncodingParams(String name, Pattern pattern, Map<byte[], Integer> encoder, Map<String, Integer> specialTokensEncoder) Creates a new instance ofGptBytePairEncodingParams. -
Method Summary
-
Constructor Details
-
GptBytePairEncodingParams
public GptBytePairEncodingParams(String name, Pattern pattern, Map<byte[], Integer> encoder, Map<String, Integer> specialTokensEncoder) Creates a new instance ofGptBytePairEncodingParams.- Parameters:
name- the name of the encoding. This is used to identify the encoding and must be uniquepattern- the pattern that is used to split the input text into tokens.encoder- the encoder that maps the tokens to their idsspecialTokensEncoder- the encoder that maps the special tokens to their ids
-
-
Method Details
-
getName
-
getPattern
-
getEncoder
-
getSpecialTokensEncoder
-