Class XSSFExcelExtractorDecorator

    • Field Detail

      • hfHelper

        protected static org.apache.poi.xssf.usermodel.helpers.HeaderFooterHelper hfHelper
        Allows access to headers/footers from raw xml strings
      • formatter

        protected final org.apache.poi.ss.usermodel.DataFormatter formatter
      • sheetParts

        protected final List<org.apache.poi.openxml4j.opc.PackagePart> sheetParts
      • drawingHyperlinks

        protected final Map<String,​String> drawingHyperlinks
      • metadata

        protected org.apache.tika.metadata.Metadata metadata
      • parseContext

        protected org.apache.tika.parser.ParseContext parseContext
    • Constructor Detail

      • XSSFExcelExtractorDecorator

        public XSSFExcelExtractorDecorator​(org.apache.tika.parser.ParseContext context,
                                           org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor,
                                           Locale locale)
    • Method Detail

      • configureExtractor

        protected void configureExtractor​(org.apache.poi.ooxml.extractor.POIXMLTextExtractor extractor,
                                          Locale locale)
      • addDrawingHyperLinks

        protected void addDrawingHyperLinks​(org.apache.poi.openxml4j.opc.PackagePart sheetPart)
      • extractHyperLinks

        protected void extractHyperLinks​(org.apache.poi.openxml4j.opc.PackagePart sheetPart,
                                         org.apache.tika.sax.XHTMLContentHandler xhtml)
                                  throws SAXException
        Throws:
        SAXException
      • extractHeaderFooter

        protected void extractHeaderFooter​(String hf,
                                           org.apache.tika.sax.XHTMLContentHandler xhtml)
                                    throws SAXException
        Throws:
        SAXException
      • processShapes

        protected void processShapes​(List<org.apache.poi.xssf.usermodel.XSSFShape> shapes,
                                     org.apache.tika.sax.XHTMLContentHandler xhtml)
                              throws SAXException
        Throws:
        SAXException
      • processSheet

        public void processSheet​(org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler sheetContentsHandler,
                                 org.apache.poi.xssf.model.Comments comments,
                                 org.apache.poi.xssf.model.StylesTable styles,
                                 org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable strings,
                                 InputStream sheetInputStream)
                          throws IOException,
                                 SAXException
        Throws:
        IOException
        SAXException
      • getMainDocumentParts

        protected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
                                                                               throws org.apache.tika.exception.TikaException
        In Excel files, sheets have things embedded in them, and sheet drawings which have the images
        Specified by:
        getMainDocumentParts in class AbstractOOXMLExtractor
        Throws:
        org.apache.tika.exception.TikaException