Песочница: GUAISA / SUNRITE FARMS

Intent ID: 25

Назад
1) Загрузить файл и получить сырой OCR JSON
Сохраняем последний файл и сырой OCR, чтобы можно было повторно тестировать маппинг без перезагрузки.
Скачать последний файл (0262516.PDF)
2) Правила трансформации
Правила трансформации OCR-данных в целевую схему (включая валидацию).
Промпт LLM-подсказки
Используется кнопкой «Подсказать rules».
Сырой OCR JSON (Pass 1)
{
  "awb": null,
  "items": [],
  "country": "Karaganda, Republic of Kazakhstan",
  "summary": {
    "total_boxes": 4,
    "total_stems": null,
    "total_amount": null
  },
  "supplier": "MECHTA OOO.-RU",
  "invoice_date": null,
  "invoice_number": null,
  "processing_report": {
    "notes": "Invoice type: flat (as per rules), but actual item data in header_text suggests grouped/mixed boxes. No physical boxes found in table data matching item structure. Problems: Item extraction failed due to mismatch of OCR table data with expected item column structure. Multi-box expansion not applicable as no items were extracted. Multiple key metadata fields (invoice_number, invoice_date, AWB, summary.total_stems, summary.total_amount) could not be extracted due to strict adherence to specified column indices/patterns in rules. Discrepancies between extracted items summary and expected summary: stems (0 vs null), amount (0 vs null), boxes (0 vs 4).",
    "status": "error",
    "sum_boxes": 0,
    "sum_stems": 0,
    "sum_amount": 0,
    "items_count": 0,
    "discrepancies": [
      "Could not extract any items due to mismatch between rules and OCR table structure for item lines. Expected item lines with 7 columns, but found no matching lines.",
      "Invoice number could not be extracted (pattern 'PACKING' in column 0 not found).",
      "Invoice date could not be extracted (pattern 'Date Amount PO # Terms' in column 0 not found).",
      "AWB could not be extracted (pattern 'Carrier' in column 0 not found).",
      "Summary total stems could not be extracted (pattern 'Total stems' in column 0 not found in table data).",
      "Summary total amount could not be extracted (pattern 'Totals' found, but value not in column 2 as per rule)."
    ],
    "expected_boxes": 4,
    "expected_stems": null,
    "expected_amount": null
  }
}
Результат (Pass 2)
Report: ok
items=7, boxes=7/7.0, stems=3500.0/3500.0, amount=830.0/830.0
notes: Тип инвойса: гибридный (данные объединены в одной строке OCR, но логически это плоский инвойс), обнаружено 7 физических коробок, применялось раскрытие мультибокса для 2 позиций, генерированы уникальные box_number для дубликатов.
{
  "awb": "PACIFIC AIR",
  "items": [
    {
      "boxes": 1,
      "length": "50",
      "variety": "HOT PINK GOTCHA",
      "category": "ROSE",
      "quantity": 400.0,
      "box_number": "1 H",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.35,
      "box_marking": "BUKETOPT",
      "total_price": 140.0
    },
    {
      "boxes": 1,
      "length": "50",
      "variety": "HOT PINK GOTCHA",
      "category": "ROSE",
      "quantity": 500.0,
      "box_number": "1 V-1",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.32,
      "box_marking": "BUKETOPT",
      "total_price": 160.0
    },
    {
      "boxes": 1,
      "length": null,
      "variety": "LIGHT PINK NENA",
      "category": "ROSE",
      "quantity": 500.0,
      "box_number": "1 V-2",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.22,
      "box_marking": "BUKETOPT",
      "total_price": 110.0
    },
    {
      "boxes": 1,
      "length": "50",
      "variety": "CREAM VENDELA",
      "category": "ROSE",
      "quantity": 500.0,
      "box_number": "2 V-1",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.2,
      "box_marking": "BUKETOPT",
      "total_price": 100.0
    },
    {
      "boxes": 1,
      "length": "50",
      "variety": "CREAM VENDELA",
      "category": "ROSE",
      "quantity": 500.0,
      "box_number": "2 V-2",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.2,
      "box_marking": "BUKETOPT",
      "total_price": 100.0
    },
    {
      "boxes": 1,
      "length": "50",
      "variety": "CREAM VENDELA",
      "category": "ROSE",
      "quantity": 550.0,
      "box_number": "2 B-1",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.2,
      "box_marking": "BUKETOPT",
      "total_price": 110.0
    },
    {
      "boxes": 1,
      "length": "50",
      "variety": "CREAM VENDELA",
      "category": "ROSE",
      "quantity": 550.0,
      "box_number": "2 B-2",
      "plantation": "MECHTA OOO.-RU",
      "unit_price": 0.2,
      "box_marking": "BUKETOPT",
      "total_price": 110.0
    }
  ],
  "country": "Republic of Kazakhstan",
  "summary": {
    "total_boxes": 7.0,
    "total_stems": 3500.0,
    "total_amount": 830.0
  },
  "supplier": "MECHTA OOO.-RU",
  "invoice_date": "02/19/2026",
  "invoice_number": "0261178",
  "processing_report": {
    "notes": "Тип инвойса: гибридный (данные объединены в одной строке OCR, но логически это плоский инвойс), обнаружено 7 физических коробок, применялось раскрытие мультибокса для 2 позиций, генерированы уникальные box_number для дубликатов.",
    "status": "ok",
    "sum_boxes": 7,
    "sum_stems": 3500.0,
    "sum_amount": 830.0,
    "items_count": 7,
    "discrepancies": [],
    "expected_boxes": 7.0,
    "expected_stems": 3500.0,
    "expected_amount": 830.0
  }
}
История (последние 20)
ID Файл Дата
24 0261178.PDF 2026-02-23 17:08:43.771770+00:00 Excel