public abstract class AbstractStringTransformer extends AbstractCharStreamTransformer
Base class to facilitate creating transformers on text content, loading
text into a StringBuilder
for memory processing.
Since 2.2.0 this class limits the memory used for content
transformation by reading one section of text at a time. Each
sections are sent for transformation once they are read,
so that no two sections exists in memory at once. Sub-classes should
respect this approach. Each of them have a maximum number of characters
equal to the maximum read size defined using setMaxReadSize(int)
.
When none is set, the default read size is defined by
TextReader.DEFAULT_MAX_READ_SIZE
.
An attempt is made to break sections nicely after a paragraph, sentence, or word. When not possible, long text will be cut at a size equal to the maximum read size.
Implementors should be conscious about memory when dealing with the string builder.
Subclasses inherit this IXMLConfigurable
configuration:
<!-- parent tag has these attribute: maxReadSize="(max characters to read at once)" sourceCharset="(character encoding)" --> <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)" > (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) -->
Constructor and Description |
---|
AbstractStringTransformer() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
int |
getMaxReadSize()
Gets the maximum number of characters to read and transform
at once.
|
int |
hashCode() |
protected void |
loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected abstract void |
loadStringTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
protected abstract void |
saveStringTransformerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setMaxReadSize(int maxReadSize)
Sets the maximum number of characters to read and transform
at once.
|
String |
toString() |
protected abstract void |
transformStringContent(String reference,
StringBuilder content,
ImporterMetadata metadata,
boolean parsed,
int sectionIndex) |
protected void |
transformTextDocument(String reference,
Reader input,
Writer output,
ImporterMetadata metadata,
boolean parsed) |
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, transformApplicableDocument
transformDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected final void transformTextDocument(String reference, Reader input, Writer output, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
transformTextDocument
in class AbstractCharStreamTransformer
ImporterHandlerException
public int getMaxReadSize()
TextReader.DEFAULT_MAX_READ_SIZE
.public void setMaxReadSize(int maxReadSize)
maxReadSize
- maximum read sizeprotected abstract void transformStringContent(String reference, StringBuilder content, ImporterMetadata metadata, boolean parsed, int sectionIndex) throws ImporterHandlerException
ImporterHandlerException
protected final void saveCharStreamTransformerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractCharStreamTransformer
saveCharStreamTransformerToXML
in class AbstractCharStreamTransformer
writer
- the xml writerXMLStreamException
- could not save to XMLprotected abstract void saveStringTransformerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
writer
- the xml writerXMLStreamException
- could not save to XMLprotected final void loadCharStreamTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractCharStreamTransformer
loadCharStreamTransformerFromXML
in class AbstractCharStreamTransformer
xml
- xml configurationIOException
- could not load from XMLprotected abstract void loadStringTransformerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
xml
- xml configurationIOException
- could not load from XMLpublic boolean equals(Object obj)
equals
in class AbstractCharStreamTransformer
public int hashCode()
hashCode
in class AbstractCharStreamTransformer
public String toString()
toString
in class AbstractCharStreamTransformer
Copyright © 2009–2021 Norconex Inc.. All rights reserved.