2
0

rocker.rst 36 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015
  1. Rocker Network Switch Register Programming Guide
  2. ************************************************
  3. ..
  4. Copyright (c) Scott Feldman <sfeldma@gmail.com>
  5. Copyright (c) Neil Horman <nhorman@tuxdriver.com>
  6. Version 0.11, 12/29/2014
  7. This program is free software; you can redistribute it and/or modify
  8. it under the terms of the GNU General Public License as published by
  9. the Free Software Foundation; either version 2 of the License, or
  10. (at your option) any later version.
  11. This program is distributed in the hope that it will be useful,
  12. but WITHOUT ANY WARRANTY; without even the implied warranty of
  13. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  14. GNU General Public License for more details.
  15. Introduction
  16. ============
  17. Overview
  18. --------
  19. This document describes the hardware/software interface for the Rocker switch
  20. device. The intended audience is authors of OS drivers and device emulation
  21. software.
  22. Notations and Conventions
  23. -------------------------
  24. * In register descriptions, [n:m] indicates a range from bit n to bit m,
  25. inclusive.
  26. * Use of leading 0x indicates a hexadecimal number.
  27. * Use of leading 0b indicates a binary number.
  28. * The use of RSVD or Reserved indicates that a bit or field is reserved for
  29. future use.
  30. * Field width is in bytes, unless otherwise noted.
  31. * Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear
  32. on read
  33. * TLV values in network-byte-order are designated with (N).
  34. PCI Configuration Registers
  35. ===========================
  36. PCI Configuration Space
  37. -----------------------
  38. Each switch instance registers as a PCI device with PCI configuration space::
  39. offset width description value
  40. ---------------------------------------------
  41. 0x0 2 Vendor ID 0x1b36
  42. 0x2 2 Device ID 0x0006
  43. 0x4 4 Command/Status
  44. 0x8 1 Revision ID 0x01
  45. 0x9 3 Class code 0x2800
  46. 0xC 1 Cache line size
  47. 0xD 1 Latency timer
  48. 0xE 1 Header type
  49. 0xF 1 Built-in self test
  50. 0x10 4 Base address low
  51. 0x14 4 Base address high
  52. 0x18-28 Reserved
  53. 0x2C 2 Subsystem vendor ID *
  54. 0x2E 2 Subsystem ID *
  55. 0x30-38 Reserved
  56. 0x3C 1 Interrupt line
  57. 0x3D 1 Interrupt pin 0x00
  58. 0x3E 1 Min grant 0x00
  59. 0x3D 1 Max latency 0x00
  60. 0x40 1 TRDY timeout
  61. 0x41 1 Retry count
  62. 0x42 2 Reserved
  63. * Assigned by sub-system implementation
  64. Memory-Mapped Register Space
  65. ============================
  66. There are two memory-mapped BARs. BAR0 maps device register space and is
  67. 0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in
  68. size, allowing for 256 MSI-X vectors.
  69. All registers are 4 or 8 bytes long. It is assumed host software will access 4
  70. byte registers with one 4-byte access, and 8 byte registers with either two
  71. 4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses,
  72. access must be lower and then upper 4-bytes, in that order.
  73. BAR0 device register space is organized as follows::
  74. offset description
  75. ------------------------------------------------------
  76. 0x0000-0x000f Bogus registers to catch misbehaving
  77. drivers. Writes do nothing. Reads
  78. back as 0xDEADBABE.
  79. 0x0010-0x00ff Test registers
  80. 0x0300-0x03ff General purpose registers
  81. 0x1000-0x1fff Descriptor control
  82. Holes in register space are reserved. Writes to reserved registers do nothing.
  83. Reads to reserved registers read back as 0.
  84. No fancy stuff like write-combining is enabled on any of the registers.
  85. BAR1 MSI-X register space is organized as follows::
  86. offset description
  87. ------------------------------------------------------
  88. 0x0000-0x0fff MSI-X vector table (256 vectors total)
  89. 0x1000-0x1fff MSI-X PBA table
  90. Interrupts, DMA, and Endianness
  91. ===============================
  92. PCI Interrupts
  93. --------------
  94. The device supports only MSI-X interrupts. BAR1 memory-mapped region contains
  95. the MSI-X vector and PBA tables, with support for up to 256 MSI-X vectors.
  96. The vector assignment is::
  97. vector description
  98. -----------------------------------------------------
  99. 0 Command descriptor ring completion
  100. 1 Event descriptor ring completion
  101. 2 Test operation completion
  102. 3 RSVD
  103. 4-255 Tx and Rx descriptor ring completion
  104. Tx vector is even
  105. Rx vector is odd
  106. A MSI-X vector table entry is 16 bytes::
  107. field offset width description
  108. -------------------------------------------------------------
  109. lower_addr 0x0 4 [31:2] message address[31:2]
  110. [1:0] Rsvd (4 byte alignment
  111. required)
  112. upper_addr 0x4 4 [31:19] Rsvd
  113. [14:0] message address[46:32]
  114. data 0x8 4 message data[31:0]
  115. control 0xc 4 [31:1] Rsvd
  116. [0] mask (0 = enable,
  117. 1 = masked)
  118. Software should install the Interrupt Service Routine (ISR) before any ports
  119. are enabled or any commands are issued on the command ring.
  120. DMA Operations
  121. --------------
  122. DMA operations are used for packet DMA to/from the CPU, command and event
  123. processing. Command processing includes statistical counters and table dumps,
  124. table insertion/deletion, and more. Event processing provides an async
  125. notification method for device-originating events. Each DMA operation has a
  126. set of control registers to manage a descriptor ring. The descriptor rings are
  127. allocated from contiguous host DMA-able memory and registers specify the rings
  128. base address, size and current head and tail indices. Software always writes
  129. the head, and hardware always writes the tail.
  130. The higher-order bit of DMA_DESC_COMP_ERR is used to mark hardware completion
  131. of a descriptor. Software will clear this bit when posting a descriptor to the
  132. ring, and hardware will set this bit when the descriptor is complete.
  133. Descriptor ring sizes must be a power of 2 and range from 2 to 64K entries.
  134. Descriptor rings' base address must be 8-byte aligned. Descriptors must be
  135. packed within ring. Each descriptor in each ring must also be aligned on an 8
  136. byte boundary. Each descriptor ring will have these registers::
  137. DMA_DESC_xxx_BASE_ADDR, offset 0x1000 + (x * 32), 64-bit, (R/W)
  138. DMA_DESC_xxx_SIZE, offset 0x1008 + (x * 32), 32-bit, (R/W)
  139. DMA_DESC_xxx_HEAD, offset 0x100c + (x * 32), 32-bit, (R/W)
  140. DMA_DESC_xxx_TAIL, offset 0x1010 + (x * 32), 32-bit, (R)
  141. DMA_DESC_xxx_CTRL, offset 0x1014 + (x * 32), 32-bit, (W)
  142. DMA_DESC_xxx_CREDITS, offset 0x1018 + (x * 32), 32-bit, (R/W)
  143. DMA_DESC_xxx_RSVD1, offset 0x101c + (x * 32), 32-bit, (R/W)
  144. Where x is descriptor ring index::
  145. index ring
  146. --------------------
  147. 0 CMD
  148. 1 EVENT
  149. 2 TX (port 0)
  150. 3 RX (port 0)
  151. 4 TX (port 1)
  152. 5 RX (port 1)
  153. .
  154. .
  155. .
  156. 124 TX (port 61)
  157. 125 RX (port 61)
  158. 126 Resv
  159. 127 Resv
  160. Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero. HEAD cannot be
  161. written past TAIL. To do so would wrap the ring. An empty ring is when HEAD
  162. == TAIL. A full ring is when HEAD is one position behind TAIL. Both HEAD and
  163. TAIL increment and modulo wrap at the ring size.
  164. CTRL register bits::
  165. bit name description
  166. ------------------------------------------------------------------------
  167. [0] CTRL_RESET Reset the descriptor ring
  168. [1:31] Reserved
  169. All descriptor types share some common fields::
  170. field width description
  171. -------------------------------------------------------------------
  172. DMA_DESC_BUF_ADDR 8 Phys addr of desc payload, 8-byte
  173. aligned
  174. DMA_DESC_COOKIE 8 Desc cookie for completion matching,
  175. upper-most bit is reserved
  176. DMA_DESC_BUF_SIZE 2 Desc payload size in bytes
  177. DMA_DESC_TLV_SIZE 2 Desc payload total size in bytes
  178. used for TLVs. Must be <=
  179. DMA_DESC_BUF_SIZE.
  180. DMA_DESC_COMP_ERR 2 Completion status of associated
  181. desc payload. High order bit is
  182. clear on new descs, toggled by
  183. hw for completed items.
  184. To support forward- and backward-compatibility, descriptor and completion
  185. payloads are specified in TLV format. Fields are packed with Type=field name,
  186. Length=field length, and Value=field value. Software will ignore unknown fields
  187. filled in by the switch. Likewise, the switch will ignore unknown fields
  188. filled in by software.
  189. Descriptor payload buffer is 8-byte aligned and TLVs are 8-byte aligned. The
  190. value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is::
  191. field width description
  192. -----------------------------
  193. type 4 TLV type
  194. len 2 TLV value length
  195. pad 2 Reserved
  196. The alignment requirements for descriptors and TLVs are to avoid unaligned
  197. access exceptions in software. Note that the payload for each TLV is also
  198. 8 byte aligned.
  199. Figure 1 shows an example descriptor buffer with two TLVs::
  200. <------- 8 bytes ------->
  201. 8-byte +––––+ +–––––––––––+–––––+–––––+ +–+
  202. align | type | len | pad | TLV#1 hdr |
  203. +–––––––––––+–––––+–––––+ (len=22) |
  204. | | |
  205. | value | TVL#1 value |
  206. | | (padded to 8-byte |
  207. | +–––––+ alignment) |
  208. | |/////| |
  209. 8-byte +––––+ +–––––––––––+–––––––––––+ |
  210. align | type | len | pad | TLV#2 hdr DESC_BUF_SIZE
  211. +–––––+–––––+–––––+–––––+ (len=2) |
  212. |value|/////////////////| TLV#2 value |
  213. +–––––+/////////////////| |
  214. |///////////////////////| |
  215. |///////////////////////| |
  216. |///////////////////////| |
  217. |////////unused/////////| |
  218. |////////space//////////| |
  219. |///////////////////////| |
  220. |///////////////////////| |
  221. |///////////////////////| |
  222. +–––––––––––––––––––––––+ +–+
  223. fig. 1
  224. TLVs can be nested within the NEST TLV type.
  225. Interrupt credits
  226. ^^^^^^^^^^^^^^^^^
  227. MSI-X vectors used for descriptor ring completions use a credit mechanism for
  228. efficient device, PCIe bus, OS and driver operations. Each descriptor ring has
  229. a credit count which represents the number of outstanding descriptors to be
  230. processed by the driver. As the device marks descriptors complete, the credit
  231. count is incremented. As the driver processes those outstanding descriptors,
  232. it returns credits back to the device. This way, the device knows the driver's
  233. progress and can make decisions about when to fire the next interrupt or not.
  234. When the credit count is zero, and the first descriptors are posted for the
  235. driver, a single interrupt is fired. Once the interrupt is fired, the
  236. interrupt is disabled (auto-masked*). In response to the interrupt, the driver
  237. will process descriptors and PIO write a returned credit value for that
  238. descriptor ring. If the driver returns all credits (the driver caught up with
  239. the device and there is no outstanding work), then the interrupt is unmasked,
  240. but not fired. If only partial credits are returned, the interrupt remains
  241. masked but the device generates an interrupt, signaling the driver that more
  242. outstanding work is available.
  243. (* this masking is unrelated to the MSI-X interrupt mask register)
  244. Endianness
  245. ----------
  246. Device registers are hard-coded to little-endian (LE). The driver should
  247. convert to/from host endianness to LE for device register accesses.
  248. Descriptors are LE. Descriptor buffer TLVs will have LE type and length
  249. fields, but the value field can either be LE or network-byte-order, depending
  250. on context. TLV values containing network packet data will be in network-byte
  251. order. A TLV value containing a field or mask used to compare against network
  252. packet data is network-byte order. For example, flow match fields (and masks)
  253. are network-byte-order since they're matched directly, byte-by-byte, against
  254. network packet data. All non-network-packet TLV multi-byte values will be LE.
  255. TLV values in network-byte-order are designated with (N).
  256. Test Registers
  257. ==============
  258. Rocker has several test registers to support troubleshooting register access,
  259. interrupt generation, and DMA operations::
  260. TEST_REG, offset 0x0010, 32-bit (R/W)
  261. TEST_REG64, offset 0x0018, 64-bit (R/W)
  262. TEST_IRQ, offset 0x0020, 32-bit (R/W)
  263. TEST_DMA_ADDR, offset 0x0028, 64-bit (R/W)
  264. TEST_DMA_SIZE, offset 0x0030, 32-bit (R/W)
  265. TEST_DMA_CTRL, offset 0x0034, 32-bit (R/W)
  266. Reads to TEST_REG and TEST_REG64 will read a value equal to twice the last
  267. value written to the register. The 32-bit and 64-bit versions are for testing
  268. 32-bit and 64-bit host accesses.
  269. A vector can be written to TEST_IRQ and the device will generate an interrupt
  270. for that vector.
  271. To test basic DMA operations, allocate a DMA-able host buffer and put the
  272. buffer address into TEST_DMA_ADDR and size into TEST_DMA_SIZE. Then, write to
  273. TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are::
  274. operation value description
  275. -----------------------------------------------------------
  276. TEST_DMA_CTRL_CLEAR 1 clear buffer
  277. TEST_DMA_CTRL_FILL 2 fill buffer bytes with 0x96
  278. TEST_DMA_CTRL_INVERT 4 invert bytes in buffer
  279. Various buffer address and sizes should be tested to verify no address boundary
  280. issue exists. In particular, buffers that start on odd-8-byte boundary and/or
  281. span multiple PAGE sizes should be tested.
  282. Ports
  283. =====
  284. Physical and Logical Ports
  285. ------------------------------------
  286. The switch supports up to 62 physical (front-panel) ports. Register
  287. PORT_PHYS_COUNT returns the actual number of physical ports available::
  288. PORT_PHYS_COUNT, offset 0x0304, 32-bit, (R)
  289. In addition to front-panel ports, the switch supports logical ports for
  290. tunnels.
  291. Front-panel ports and logical tunnel ports are mapped into a single 32-bit port
  292. space. A special CPU port is assigned port 0. The front-panel ports are
  293. mapped to ports 1-62. A special loopback port is assigned port 63. Logical
  294. tunnel ports are assigned ports 0x0001000-0x0001ffff.
  295. To summarize the port assignments::
  296. port mapping
  297. -------------------------------------------------------
  298. 0 CPU port (for packets to/from host CPU)
  299. 1-62 front-panel physical ports
  300. 63 loopback port
  301. 64-0x0000ffff RSVD
  302. 0x00010000-0x0001ffff logical tunnel ports
  303. 0x00020000-0xffffffff RSVD
  304. Physical Port Mode
  305. ------------------
  306. Switch front-panel ports operate in a mode. Currently, the only mode is
  307. OF-DPA. OF-DPA[1] mode is based on OpenFlow Data Plane Abstraction (OF-DPA)
  308. Abstract Switch Specification, Version 1.0, from Broadcom Corporation. To
  309. set/get the mode for front-panel ports, see port settings, below.
  310. Port Settings
  311. -------------
  312. Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS::
  313. PORT_PHYS_LINK_STATUS, offset 0x0310, 64-bit, (R)
  314. Value is port bitmap. Bits 0 and 63 always read 0. Bits 1-62
  315. read 1 for link UP and 0 for link DOWN for respective front-panel ports.
  316. Other properties for front-panel ports are available via DMA CMD descriptors::
  317. Get PORT_SETTINGS descriptor:
  318. field width description
  319. ----------------------------------------------
  320. PORT_SETTINGS 2 CMD_GET
  321. PPORT 4 Physical port #
  322. Get PORT_SETTINGS completion:
  323. field width description
  324. ----------------------------------------------
  325. PPORT 4 Physical port #
  326. SPEED 4 Current port interface speed, in Mbps
  327. DUPLEX 1 1 = Full, 0 = Half
  328. AUTONEG 1 1 = enabled, 0 = disabled
  329. MACADDR 6 Port MAC address
  330. MODE 1 0 = OF-DPA
  331. LEARNING 1 MAC address learning on port
  332. 1 = enabled
  333. 0 = disabled
  334. PHYS_NAME <var> Physical port name (string)
  335. Set PORT_SETTINGS descriptor:
  336. field width description
  337. ----------------------------------------------
  338. PORT_SETTINGS 2 CMD_SET
  339. PPORT 4 Physical port #
  340. SPEED 4 Port interface speed, in Mbps
  341. DUPLEX 1 1 = Full, 0 = Half
  342. AUTONEG 1 1 = enabled, 0 = disabled
  343. MACADDR 6 Port MAC address
  344. MODE 1 0 = OF-DPA
  345. Port Enable
  346. -----------
  347. Front-panel ports are initially disabled, which means port ingress and egress
  348. packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE::
  349. PORT_PHYS_ENABLE: offset 0x0318, 64-bit, (R/W)
  350. Value is bitmap of first 64 ports. Bits 0 and 63 are ignored
  351. and always read as 0. Write 1 to enable port; write 0 to disable it.
  352. Default is 0.
  353. Switch Control
  354. ==============
  355. This section covers switch-wide register settings.
  356. Control
  357. -------
  358. This register is used for low level control of the switch::
  359. CONTROL: offset 0x0300, 32-bit, (W)
  360. bit name description
  361. ------------------------------------------------------------------------
  362. [0] CONTROL_RESET If set, device will perform reset
  363. [1:31] Reserved
  364. Switch ID
  365. ---------
  366. The switch has a SWITCH_ID to be used by software to uniquely identify the
  367. switch::
  368. SWITCH_ID: offset 0x0320, 64-bit, (R)
  369. Value is opaque to switch software and no special encoding is implied.
  370. Events
  371. ======
  372. Non-I/O asynchronous events from the device are notified to the host using the
  373. event ring. The TLV structure for events is::
  374. field width description
  375. ---------------------------------------------------
  376. TYPE 4 Event type, one of:
  377. 1: LINK_CHANGED
  378. 2: MAC_VLAN_SEEN
  379. INFO <nest> Event info (details below)
  380. Link Changed Event
  381. ------------------
  382. When link status changes on a physical port, this event is generated::
  383. field width description
  384. ---------------------------------------------------
  385. INFO <nest>
  386. PPORT 4 Physical port
  387. LINKUP 1 Link status:
  388. 0: down
  389. 1: up
  390. MAC VLAN Seen Event
  391. -------------------
  392. When a packet ingresses on a port and the source MAC/VLAN isn't known to the
  393. device, the device will generate this event. In response to the event, the
  394. driver should install to the device the MAC/VLAN on the port into the bridge
  395. table. Once installed, the MAC/VLAN is known on the port and this event will
  396. no longer be generated.
  397. ::
  398. field width description
  399. ---------------------------------------------------
  400. INFO <nest>
  401. PPORT 4 Physical port
  402. MAC 6 MAC address
  403. VLAN 2 VLAN ID
  404. CPU Packet Processing
  405. =====================
  406. Ingress packets directed to the host CPU for further processing are delivered
  407. in the DMA RX ring. Likewise, host CPU originating packets destined to egress
  408. on switch ports are scheduled by software using the DMA TX ring.
  409. Tx Packet Processing
  410. --------------------
  411. Software schedules packets for egress on switch ports using the DMA TX ring. A
  412. TX descriptor buffer describes the packet location and size in host DMA-able
  413. memory, the destination port, and any hardware-offload functions (such as L3
  414. payload checksum offload). Software then bumps the descriptor head to signal
  415. hardware of new Tx work. In response, hardware will DMA read Tx descriptors up
  416. to head, DMA read descriptor buffer and packet data, perform offloading
  417. functions, and finally frame packet on wire (network). Once packet processing
  418. is complete, hardware will writeback status to descriptor(s) to signal to
  419. software that Tx is complete and software resources (e.g. skb) backing packet
  420. can be released.
  421. Figure 2 shows an example 3-fragment packet queued with one Tx descriptor. A
  422. TLV is used for each packet fragment::
  423. pkt frag 1
  424. +–––––––+ +–+
  425. +–––+ | |
  426. desc buf | | | |
  427. +––––––––+ | | | |
  428. Tx ring +–––+ +–––––+ | | |
  429. +–––––––––+ | | TLVs | +–––––––+ |
  430. | +–––+ +––––––––+ pkt frag 2 |
  431. | desc 0 | | +–––––+ +–––––––+ |
  432. +–––––––––+ | TLVs | +–––+ | |
  433. head+–+ | +––––––––+ | | |
  434. | desc 1 | | +–––––+ +–––––––+ |pkt
  435. +–––––––––+ | TLVs | | |
  436. | | +––––––––+ | pkt frag 3 |
  437. | | | +–––––––+ |
  438. +–––––––––+ +–––+ | |
  439. | | | | |
  440. | | | | |
  441. +–––––––––+ | | |
  442. | | | | |
  443. | | | | |
  444. +–––––––––+ | | |
  445. | | +–––––––+ +–+
  446. | |
  447. +–––––––––+
  448. fig 2.
  449. The TLVs for Tx descriptor buffer are::
  450. field width description
  451. ---------------------------------------------------------------------
  452. PPORT 4 Destination physical port #
  453. TX_OFFLOAD 1 Hardware offload modes:
  454. 0: no offload
  455. 1: insert IP csum (ipv4 only)
  456. 2: insert TCP/UDP csum
  457. 3: L3 csum calc and insert
  458. into csum offset (TX_L3_CSUM_OFF)
  459. 16-bit 1's complement csum value.
  460. IPv4 pseudo-header and IP
  461. already calculated by OS
  462. and inserted.
  463. 4: TSO (TCP Segmentation Offload)
  464. TX_L3_CSUM_OFF 2 For L3 csum offload mode, the offset,
  465. from the beginning of the packet,
  466. of the csum field in the L3 header
  467. TX_TSO_MSS 2 For TSO offload mode, the
  468. Maximum Segment Size in bytes
  469. TX_TSO_HDR_LEN 2 For TSO offload mode, the
  470. length of ethernet, IP, and
  471. TCP/UDP headers, including IP
  472. and TCP options.
  473. TX_FRAGS <array> Packet fragments
  474. TX_FRAG <nest> Packet fragment
  475. TX_FRAG_ADDR 8 DMA address of packet fragment
  476. TX_FRAG_LEN 2 Packet fragment length
  477. Possible status return codes in descriptor on completion are::
  478. DESC_COMP_ERR reason
  479. --------------------------------------------------------------------
  480. 0 OK
  481. -ROCKER_ENXIO address or data read err on desc buf or packet
  482. fragment
  483. -ROCKER_EINVAL bad pport or TSO or csum offloading error
  484. -ROCKER_ENOMEM no memory for internal staging tx fragment
  485. Rx Packet Processing
  486. --------------------
  487. For packets ingressing on switch ports that are not forwarded by the switch but
  488. rather directed to the host CPU for further processing are delivered in the DMA
  489. RX ring. Rx descriptor buffers are allocated by software and placed on the
  490. ring. Hardware will fill Rx descriptor buffers with packet data, write the
  491. completion, and signal to software that a new packet is ready. Since Rx packet
  492. size is not known a-priori, the Rx descriptor buffer must be allocated for
  493. worst-case packet size. A single Rx descriptor will contain the entire Rx
  494. packet data in one RX_FRAG. Other Rx TLVs describe and hardware offloads
  495. performed on the packet, such as checksum validation.
  496. The TLVs for Rx descriptor buffer are::
  497. field width description
  498. ---------------------------------------------------
  499. PPORT 4 Source physical port #
  500. RX_FLAGS 2 Packet parsing flags:
  501. (1 << 0): IPv4 packet
  502. (1 << 1): IPv6 packet
  503. (1 << 2): csum calculated
  504. (1 << 3): IPv4 csum good
  505. (1 << 4): IP fragment
  506. (1 << 5): TCP packet
  507. (1 << 6): UDP packet
  508. (1 << 7): TCP/UDP csum good
  509. (1 << 8): Offload forward
  510. RX_CSUM 2 IP calculated checksum:
  511. IPv4: IP payload csum
  512. IPv6: header and payload csum
  513. (Only valid is RX_FLAGS:csum calc is set)
  514. RX_FRAG_ADDR 8 DMA address of packet fragment
  515. RX_FRAG_MAX_LEN 2 Packet maximum fragment length
  516. RX_FRAG_LEN 2 Actual packet fragment length after receive
  517. Offload forward RX_FLAG indicates the device has already forwarded the packet
  518. so the host CPU should not also forward the packet.
  519. Possible status return codes in descriptor on completion are::
  520. DESC_COMP_ERR reason
  521. --------------------------------------------------------------------
  522. 0 OK
  523. -ROCKER_ENXIO address or data read err on desc buf
  524. -ROCKER_ENOMEM no memory for internal staging desc buf
  525. -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain
  526. packet data TLV and other TLVs.
  527. OF-DPA Mode
  528. ===========
  529. OF-DPA mode allows the switch to offload flow packet processing functions to
  530. hardware. An OpenFlow controller would communicate with an OpenFlow agent
  531. installed on the switch. The OpenFlow agent would (directly or indirectly)
  532. communicate with the Rocker switch driver, which in turn would program switch
  533. hardware with flow functionality, as defined in OF-DPA. The block diagram is::
  534. +–––––––––––––––----–––+
  535. | OF |
  536. | Remote Controller |
  537. +––––––––+––----–––––––+
  538. |
  539. |
  540. +––––––––+–––––––––+
  541. | OF |
  542. | Local Agent |
  543. +––––––––––––––––––+
  544. | |
  545. | Rocker Driver |
  546. +––––––––––––––––––+
  547. <this spec>
  548. +––––––––––––––––––+
  549. | |
  550. | Rocker Switch |
  551. +––––––––––––––––––+
  552. To participate in flow functions, ports must be configure for OF-DPA mode
  553. during switch initialization.
  554. OF-DPA Flow Table Interface
  555. ---------------------------
  556. There are commands to add, modify, delete, and get stats of flow table entries.
  557. The commands are issued using the DMA CMD descriptor ring. The following
  558. commands are defined::
  559. CMD_ADD: add an entry to flow table
  560. CMD_MOD: modify an entry in flow table
  561. CMD_DEL: delete an entry from flow table
  562. CMD_GET_STATS: get stats for flow entry
  563. TLVs for add and modify commands are::
  564. field width description
  565. ----------------------------------------------------
  566. OF_DPA_CMD 2 CMD_[ADD|MOD]
  567. OF_DPA_TBL 2 Flow table ID
  568. 0: ingress port
  569. 10: vlan
  570. 20: termination mac
  571. 30: unicast routing
  572. 40: multicast routing
  573. 50: bridging
  574. 60: ACL policy
  575. OF_DPA_PRIORITY 4 Flow priority
  576. OF_DPA_HARDTIME 4 Hard timeout for flow
  577. OF_DPA_IDLETIME 4 Idle timeout for flow
  578. OF_DPA_COOKIE 8 Cookie
  579. Additional TLVs based on flow table ID:
  580. Table ID 0: ingress port::
  581. field width description
  582. ----------------------------------------------------
  583. OF_DPA_IN_PPORT 4 ingress physical port number
  584. OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
  585. Table ID 10: vlan::
  586. field width description
  587. ----------------------------------------------------
  588. OF_DPA_IN_PPORT 4 ingress physical port number
  589. OF_DPA_VLAN_ID 2 (N) vlan ID
  590. OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask
  591. OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
  592. OF_DPA_NEW_VLAN_ID 2 (N) new vlan ID
  593. Table ID 20: termination mac::
  594. field width description
  595. ----------------------------------------------------
  596. OF_DPA_IN_PPORT 4 ingress physical port number
  597. OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask
  598. OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd
  599. OF_DPA_DST_MAC 6 (N) destination MAC
  600. OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask
  601. OF_DPA_VLAN_ID 2 (N) vlan ID
  602. OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask
  603. OF_DPA_GOTO_TBL 2 only acceptable values are
  604. unicast or multicast routing
  605. table IDs
  606. OF_DPA_OUT_PPORT 2 if specified, must be
  607. controller, set zero otherwise
  608. Table ID 30: unicast routing::
  609. field width description
  610. ----------------------------------------------------
  611. OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd
  612. OF_DPA_DST_IP 4 (N) destination IPv4 address.
  613. Must be unicast address
  614. OF_DPA_DST_IP_MASK 4 (N) IP mask. Must be prefix mask
  615. OF_DPA_DST_IPV6 16 (N) destination IPv6 address.
  616. Must be unicast address
  617. OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask. Must be prefix mask
  618. OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
  619. OF_DPA_GROUP_ID 4 data for GROUP action must
  620. be an L3 Unicast group entry
  621. Table ID 40: multicast routing::
  622. field width description
  623. ----------------------------------------------------
  624. OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd
  625. OF_DPA_VLAN_ID 2 (N) vlan ID
  626. OF_DPA_SRC_IP 4 (N) source IPv4. Optional,
  627. can contain IPv4 address,
  628. must be completely masked
  629. if not used
  630. OF_DPA_SRC_IP_MASK 4 (N) IP Mask
  631. OF_DPA_DST_IP 4 (N) destination IPv4 address.
  632. Must be multicast address
  633. OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional.
  634. Can contain IPv6 address,
  635. must be completely masked
  636. if not used
  637. OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask.
  638. OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must
  639. be multicast address
  640. Must be multicast address
  641. OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
  642. OF_DPA_GROUP_ID 4 data for GROUP action must
  643. be an L3 multicast group entry
  644. Table ID 50: bridging::
  645. field width description
  646. ----------------------------------------------------
  647. OF_DPA_VLAN_ID 2 (N) vlan ID
  648. OF_DPA_TUNNEL_ID 4 tunnel ID
  649. OF_DPA_DST_MAC 6 (N) destination MAC
  650. OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask
  651. OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
  652. OF_DPA_GROUP_ID 4 data for GROUP action must
  653. be a L2 Interface, L2
  654. Multicast, L2 Flood,
  655. or L2 Overlay group entry
  656. as appropriate
  657. OF_DPA_TUNNEL_LPORT 4 unicast Tenant Bridging
  658. flows specify a tunnel
  659. logical port ID
  660. OF_DPA_OUT_PPORT 2 data for OUTPUT action,
  661. restricted to CONTROLLER,
  662. set to 0 otherwise
  663. Table ID 60: acl policy::
  664. field width description
  665. ----------------------------------------------------
  666. OF_DPA_IN_PPORT 4 ingress physical port number
  667. OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask
  668. OF_DPA_ETHERTYPE 2 (N) ethertype
  669. OF_DPA_VLAN_ID 2 (N) vlan ID
  670. OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask
  671. OF_DPA_VLAN_PCP 2 (N) vlan Priority Code Point
  672. OF_DPA_VLAN_PCP_MASK 2 (N) vlan Priority Code Point mask
  673. OF_DPA_SRC_MAC 6 (N) source MAC
  674. OF_DPA_SRC_MAC_MASK 6 (N) source MAC mask
  675. OF_DPA_DST_MAC 6 (N) destination MAC
  676. OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask
  677. OF_DPA_TUNNEL_ID 4 tunnel ID
  678. OF_DPA_SRC_IP 4 (N) source IPv4. Optional,
  679. can contain IPv4 address,
  680. must be completely masked
  681. if not used
  682. OF_DPA_SRC_IP_MASK 4 (N) IP Mask
  683. OF_DPA_DST_IP 4 (N) destination IPv4 address.
  684. Must be multicast address
  685. OF_DPA_DST_IP_MASK 4 (N) IP Mask
  686. OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional.
  687. Can contain IPv6 address,
  688. must be completely masked
  689. if not used
  690. OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask
  691. OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must
  692. be multicast address.
  693. OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask
  694. OF_DPA_SRC_ARP_IP 4 (N) source IPv4 address in the ARP
  695. payload. Only used if ethertype
  696. == 0x0806.
  697. OF_DPA_SRC_ARP_IP_MASK 4 (N) IP Mask
  698. OF_DPA_IP_PROTO 1 IP protocol
  699. OF_DPA_IP_PROTO_MASK 1 IP protocol mask
  700. OF_DPA_IP_DSCP 1 DSCP
  701. OF_DPA_IP_DSCP_MASK 1 DSCP mask
  702. OF_DPA_IP_ECN 1 ECN
  703. OF_DPA_IP_ECN_MASK 1 ECN mask
  704. OF_DPA_L4_SRC_PORT 2 (N) L4 source port, only for
  705. TCP, UDP, or SCTP
  706. OF_DPA_L4_SRC_PORT_MASK 2 (N) L4 source port mask
  707. OF_DPA_L4_DST_PORT 2 (N) L4 source port, only for
  708. TCP, UDP, or SCTP
  709. OF_DPA_L4_DST_PORT_MASK 2 (N) L4 source port mask
  710. OF_DPA_ICMP_TYPE 1 ICMP type, only if IP
  711. protocol is 1
  712. OF_DPA_ICMP_TYPE_MASK 1 ICMP type mask
  713. OF_DPA_ICMP_CODE 1 ICMP code
  714. OF_DPA_ICMP_CODE_MASK 1 ICMP code mask
  715. OF_DPA_IPV6_LABEL 4 (N) IPv6 flow label
  716. OF_DPA_IPV6_LABEL_MASK 4 (N) IPv6 flow label mask
  717. OF_DPA_GROUP_ID 4 data for GROUP action
  718. OF_DPA_QUEUE_ID_ACTION 1 write the queue ID
  719. OF_DPA_NEW_QUEUE_ID 1 queue ID
  720. OF_DPA_VLAN_PCP_ACTION 1 write the VLAN priority
  721. OF_DPA_NEW_VLAN_PCP 1 VLAN priority
  722. OF_DPA_IP_DSCP_ACTION 1 write the DSCP
  723. OF_DPA_NEW_IP_DSCP 1 new DSCP
  724. OF_DPA_TUNNEL_LPORT 4 restrct to valid tunnel
  725. logical port, set to 0
  726. otherwise.
  727. OF_DPA_OUT_PPORT 2 data for OUTPUT action,
  728. restricted to CONTROLLER,
  729. set to 0 otherwise
  730. OF_DPA_CLEAR_ACTIONS 4 if 1 packets matching flow are
  731. dropped (all other instructions
  732. ignored)
  733. TLVs for flow delete and get stats command are::
  734. field width description
  735. ---------------------------------------------------
  736. OF_DPA_CMD 2 CMD_[DEL|GET_STATS]
  737. OF_DPA_COOKIE 8 Cookie
  738. On completion of get stats command, the descriptor buffer is written back with
  739. the following TLVs::
  740. field width description
  741. ---------------------------------------------------
  742. OF_DPA_STAT_DURATION 4 Flow duration
  743. OF_DPA_STAT_RX_PKTS 8 Received packets
  744. OF_DPA_STAT_TX_PKTS 8 Transmit packets
  745. Possible status return codes in descriptor on completion are::
  746. DESC_COMP_ERR command reason
  747. --------------------------------------------------------------------
  748. 0 all OK
  749. -ROCKER_EFAULT all head or tail index outside
  750. of ring
  751. -ROCKER_ENXIO all address or data read err on
  752. desc buf
  753. -ROCKER_EMSGSIZE GET_STATS cmd descriptor buffer wasn't
  754. big enough to contain write-back
  755. TLVs
  756. -ROCKER_EINVAL all invalid parameters passed in
  757. -ROCKER_EEXIST ADD entry already exists
  758. -ROCKER_ENOSPC ADD no space left in flow table
  759. -ROCKER_ENOENT MOD|DEL|GET_STATS cookie invalid
  760. Group Table Interface
  761. ---------------------
  762. There are commands to add, modify, delete, and get stats of group table
  763. entries. The commands are issued using the DMA CMD descriptor ring. The
  764. following commands are defined::
  765. CMD_ADD: add an entry to group table
  766. CMD_MOD: modify an entry in group table
  767. CMD_DEL: delete an entry from group table
  768. CMD_GET_STATS: get stats for group entry
  769. TLVs for add and modify commands are::
  770. field width description
  771. -----------------------------------------------------------
  772. FLOW_GROUP_CMD 2 CMD_[ADD|MOD]
  773. FLOW_GROUP_ID 2 Flow group ID
  774. FLOW_GROUP_TYPE 1 Group type:
  775. 0: L2 interface
  776. 1: L2 rewrite
  777. 2: L3 unicast
  778. 3: L2 multicast
  779. 4: L2 flood
  780. 5: L3 interface
  781. 6: L3 multicast
  782. 7: L3 ECMP
  783. 8: L2 overlay
  784. FLOW_VLAN_ID 2 Vlan ID (types 0, 3, 4, 6)
  785. FLOW_L2_PORT 2 Port (types 0)
  786. FLOW_INDEX 4 Index (all types but 0)
  787. FLOW_OVERLAY_TYPE 1 Overlay sub-type (type 8):
  788. 0: Flood unicast tunnel
  789. 1: Flood multicast tunnel
  790. 2: Multicast unicast tunnel
  791. 3: Multicast multicast tunnel
  792. FLOW_GROUP_ACTION nest
  793. FLOW_GROUP_ID 2 next group ID in chain (all
  794. types except 0)
  795. FLOW_OUT_PORT 4 egress port (types 0, 8)
  796. FLOW_POP_VLAN_TAG 1 strip outer VLAN tag (type 1
  797. only)
  798. FLOW_VLAN_ID 2 (types 1, 5)
  799. FLOW_SRC_MAC 6 (types 1, 2, 5)
  800. FLOW_DST_MAC 6 (types 1, 2)
  801. TLVs for flow delete and get stats command are::
  802. field width description
  803. -----------------------------------------------------------
  804. FLOW_GROUP_CMD 2 CMD_[DEL|GET_STATS]
  805. FLOW_GROUP_ID 2 Flow group ID
  806. On completion of get stats command, the descriptor buffer is written back with
  807. the following TLVs::
  808. field width description
  809. ---------------------------------------------------
  810. FLOW_GROUP_ID 2 Flow group ID
  811. FLOW_STAT_DURATION 4 Flow duration
  812. FLOW_STAT_REF_COUNT 4 Flow reference count
  813. FLOW_STAT_BUCKET_COUNT 4 Flow bucket count
  814. Possible status return codes in descriptor on completion are::
  815. DESC_COMP_ERR command reason
  816. --------------------------------------------------------------------
  817. 0 all OK
  818. -ROCKER_EFAULT all head or tail index outside
  819. of ring
  820. -ROCKER_ENXIO all address or data read err on
  821. desc buf
  822. -ROCKER_ENOSPC GET_STATS cmd descriptor buffer wasn't
  823. big enough to contain write-back
  824. TLVs
  825. -ROCKER_EINVAL ADD|MOD invalid parameters passed in
  826. -ROCKER_EEXIST ADD entry already exists
  827. -ROCKER_ENOSPC ADD no space left in flow table
  828. -ROCKER_ENOENT MOD|DEL|GET_STATS group ID invalid
  829. -ROCKER_EBUSY DEL group reference count non-zero
  830. -ROCKER_ENODEV ADD next group ID doesn't exist
  831. References
  832. ==========
  833. [1] OpenFlow Data Plane Abstraction (OF-DPA) Abstract Switch Specification,
  834. Version 1.0, from Broadcom Corporation, February 21, 2014.