summer-of-code - GreenAsh Poignant wit and hippie ramblings that are pertinent to summer-of-code https://greenash.net.au/thoughts/topics/summer-of-code/ 2006-09-07T00:00:00Z Import / Export API: final thoughts 2006-09-07T00:00:00Z 2006-09-07T00:00:00Z Jaza https://greenash.net.au/thoughts/2006/09/import-export-api-final-thoughts/ The Summer of Code has finally come to an end, and it's time for me to write up my final thoughts on my involvement in it. The Import / Export API module has been a long and challenging project, but it's also been great fun and has, in my opinion, been a worthwhile cause to devote my time to. My mentor has given my work the final tick of approval (which is great!), and I personally feel that the project has been an overwhelming success.

Filling out the final student evaluation for the SoC was an interesting experience, because it made me realise that as someone with significant prior experience in developing for my mentor organisation (i.e. Drupal), I was actually in the minority. Many of the questions didn't apply to me, or weren't entirely relevant, as they assumed that I was just starting out with my mentor organisation, and that the SoC was my 'gateway' to learning the ropes of that organisation. I, on the other hand, already had about 18 months of experience as a Drupal developer when the SoC began, and I always viewed SoC as an opportunity to work on developing an important large-scale module (that I wouldn't have had time to develop otherwise), rather than as a 'Drupal boot camp'.

The Import / Export API is also something of a unique project, in that it's actually quite loosely coupled to Drupal. I never envisaged that it would turn out like this, but the API is actually so loosely coupled, that it could very easily be used as an import / export tool for almost any other application. This makes me question whether it would have been better to develop the API as a completely stand-alone project, with zero dependency on Drupal, rather than as a Drupal module, with a few (albeit fairly superficial) dependencies. In this context, the API is a bit like CiviCRM, in that it is basically a fully-functional application (or library, in the API's case) all by itself, but in that it relies on Drupal for a few lil' things, such as providing a pretty face to the user, and integration as part of a content-managed web site.

The module today

For those of you that haven't tried it out yet, the API is an incredibly useful and flexible tool, when it comes to getting data in and out of your site. The module currently supports importing and exporting any entity in Drupal core, in either XML or in CSV format. Support for CCK nodes, node types, and fields is also currently included. All XML tags or CSV field names can have custom mappings defined during import or export. At the moment, the UI is very basic (the plan is to work on this some more in the future), but it exposes the essential functionality of the API well enough, and it's reasonably easy to use.

The module is superior to existing import modules, because it allows you to import a variety of different entities, but to maintain and to manage the relationships between those entities. For example: nodes, comments, and users are all different entities, but they are also all related to each other; nodes are written by users, and comments are written about particular nodes by users. You could import nodes and users using the node_import and user_import modules. But these two modules would not make any effort to link your nodes to your users, or to maintain any link that existed in your imported data. The Import / Export API recognises and maintains all such links.

As for stability, the API still has a few significant known bugs lurking around in it, but overall it's quite stable and reliable. The API is still officially in beta mode, and more beta testing is still very much welcome. Many thanks to the people who have dedicated their time to testing and bug fixing thus far (you know who you are!) - it wouldn't be the useful tool that it is without your help.

The module tomorrow

And now for the most important question of all: what is the future of the API? What additional features would I (and others) like to see implemented post-SoC? What applications are people likely to build on top of it? And will the module, in some shape or form, to a greater or lesser extent, ever become part of Drupal core?

First, the additional features. I was hoping to get some of these in as part of the SoC timeframe, but as it turned out, I barely had time to meet the base requirements that I originally set for myself. So here's my wish list for the API (and in my case, mere wishing ain't gonna make 'em happen - only coding will!):

  1. File handling. The 'file' field type in the current API is nothing more than a stub - at present, it's almost exactly the same as the 'string' field type. I would like to actually implement file handling, so that files can be seamlessly imported and exported, along with the database-centric data in a Drupal site. Many would argue that this is critical functionality. I feel you, folks - that's why this is no. 1.
  2. Filtering and sorting in queries. The 'db' engines of the API are cutting-edge in their support for references and relationships, but they currently lack all but the most basic support for filtering and sorting in database queries. Ideally, the API will have an extensible system for this, similar to what is currently offered for node queries by the views module. A matching UI of views' calibre would be awesome, too.
  3. Good-looking and flexible UI. The current UI is about as basic as it gets. The long-term plan is to move the UI out into a separate project (it's already a separate module), and to expose much more of the API through the interface, e.g. disabling import / export of specified fields, forcing of ID key generation / key matching and updating, control over alternate key handling. There are also plenty of cosmetic improvements that could be made to the UI, e.g. more wizard-like form dialogs (I think I'll wait for Drupal 5.0 / FAPI 2.0 before doing this), flexible control of output format (choice between file download, save output on server, display raw output, etc).
  4. Validate and submit (and more?) callbacks. This is really kind of dependent on the API's status in regards to going into Drupal core (see further down). But the general plan is to implement FAPI-like validate and submit callbacks within the API's data definition system.

Next, there are the possible applications of the API. The API is a great foundation for a plethora of possibilities. I have faith that, over the course of the near future, developers will start to have a look at the API, and that they will recognise its potential, and that they will start to develop really cool things on top of it. Of course, I may be wrong. It's possible that almost no developers will ever look at the API, and that the API will rot away in the dark corners of Drupal contrib, before sinking slowly into the depths of oblivion. But I hope that doesn't happen.

Some of the possible applications that have come to my mind, and that other people have mentioned to me:

  • Import / export (duh!)
  • Automated backup
  • Migration from test to production environment
  • Production site migration
  • Site merging
  • Multisite content sharing (a la publish and subscribe)
  • Migration from other software (e.g. Movable Type, WordPress)

Finally, there is the question of whether or not (and what parts of) the API will ever find itself in Drupal core. From the very beginning, my mentor Adrian has been letting me in on his secret super-evil plan for world domination (or, at the least, for Drupal domination). I can confide to all of you that getting parts of the API in core is part of this plan. In particular, the data definition system is a potential candidate for what will be the new 'data model / data layer / data API' foundation of FAPI 3.0 (i.e. Drupal post-upcoming-5.0-release).

However, I cannot guarantee that the data definition system of the API will ever make it into core, and I certainly cannot predict in what form it will be by the time that it gets in (if it gets in, that is). Adrian has let slip a few ideas of his own lately (in the form of PHP pseudo-code), and his ideas for a data definition system seem to be quite different from mine. No doubt every other Drupal developer will also have their own ideas on this - after all, it will be a momentous change for Drupal when it happens, and everyone has a right to be a part of that change. Anyway, Adrian has promised to reveal his grand plans for FAPI 3.0 during his presentation at the upcoming Brussels DrupalCon (which I unfortunately won't be able to attend), so I'm sure that after that has happened, we'll all be much more enlightened.

The API's current data definition system is not exactly perfectly suited for Drupal core. It was developed specifically to support a generic import / export system, and that fact shows itself in many ways. The system is based around directly reflecting the structure of the Drupal database, for the purposes of SQL query generation and plain text transformation. That will have to change if the system goes into Drupal core, because Drupal core has very different priorities. Drupal core is concerned more with a flexible callback system, with a robust integration into the form generation system, and with rock-solid performance all round. Whether the data definition system of the API is able to adapt to meet these demands, is something that remains to be seen.

Further resources

Well, that's about all that I have to say about the Import / Export API module, and about my involvement in the 2006 Google Summer of Code. But before you go away, here are some useful links to get you started on your forays into the world of importing and exporting in Drupal:

Many thanks to Angie Byron (a.k.a. webchick) for your great work last year on the Forms API QuickStart guide and reference guide documents, which proved to be an invaluable template for me to use in writing these documents for the Import / Export API. Thanks also, Angie, for your great work as part of the SoC organising team this year!

And, last but not least, a big thankyou to Adrian, Sami, Rob, Karoly, Moshe, Earl, Dan, and everyone else who has helped me to get through the project, and to learn heaps and to have plenty of fun along the way. I couldn't have done it without you - all of you!

SoC - it's been a blast. ;-)

]]>
Import / Export API: progress report #4 2006-08-11T00:00:00Z 2006-08-11T00:00:00Z Jaza https://greenash.net.au/thoughts/2006/08/import-export-api-progress-report-4/ In a small, poorly ventilated room, somewhere in Australia, there are four geeky fingers and two geeky thumbs, and they are attached to two geeky hands. All of the fingers and all of the thumbs are racing haphazardly across a black keyboard, trying to churn out PHP; but mostly they're just tapping repeatedly, and angrily, on the 'backspace' key. A pair of eyes squint tiredly at the LCD monitor before them, trying to discern whether the miniscule black dot that they perceive is a speck of dirt, or yet another pixel that has gone to pixel heaven.

But the black dot is neither of these things. The black dot is but a figment of this young geek's delirious and overly-caffeinated imagination. Because, you see, on this upside-down side of the world, where the seasons are wrong and the toilets flush counter-clockwise, there is a Drupaller who has been working on the Summer of Code all winter long. And he has less than two weeks until the deadline!

That's right: in just 10 days, the Summer of Code will be over, and the Drupal Import / Export API will (hopefully) have met its outcomes, and will be ready for public use. That is the truth. Most of the other statements that I made above, however, are not the truth. In fact:

Anyway, on to the real deal...

The current status of the Import / Export API is as follows. The DB, XML, and CSV engines are now all complete, and are (I hope) at beta-quality. However, they are still in need of a fair bit of testing and bug fixing. The same can be said for the alternate key and ID key handling systems. The data definitions for all Drupal 4.7 core entities (except for a few that weren't on the to-do list) are now complete.

There are two things that still need to be done urgently. The first of these is a system for passing custom mappings (and other attributes), for any field, into the importexportapi_get_data() function. The whole point of the mapping system is that field mappings are completely customisable, but this point cannot be realised until custom mappings can actually be provided to the API. The second of these is the data definitions for CCK fields and node types. Many people have been asking me about this, including my mentor Adrian, and it will certainly be very cool when it is ready and working.

One more thing that also needs to be done before the final deadline, is to write a very basic UI module, that allows people to actually demo the API. I won't be able to build my dream import / export whiz-bang get-and-put-anything-in-five-easy-steps UI in 10 days. But I should be able to build something functional enough, that it allows people (such as my reviewers) to test out and enjoy the key things that the API can do. Once this UI module is written, I will be removing all of the little bits of UI cruftiness that are currently in the main module.

Was there something I didn't mention? Ah, yes. Documentation, documentation, documentation - how could I forget thee? As constant as the northern star, and almost as hard to find, the task of documentation looms over us all. I will be endeavouring to produce a reference guide for the import / export API within the deadline, which will be as complete as possible. But no guarantees that it will be complete within that time. The reference guide will focus on documenting the field types and their possible attributes, much like the Drupal forms API reference does. Other docs - such as tutorials, explanations, and tips - will come later.

There are many things that the API is currently lacking, and that I would really like to see it ultimately have. Most of these things did not occur to me until I was well into coding the project, and none of them will actually be coded until after the project has finished. One of these things is an extensible query filtering and sorting system, much like the system that the amazing views module boasts. Another is a validation and fine-grained error-handling system (mainly for imports).

But more on these ideas another time. For now, I have a module to finish coding.

]]>
Import / Export API: progress report #3 2006-07-15T00:00:00Z 2006-07-15T00:00:00Z Jaza https://greenash.net.au/thoughts/2006/07/import-export-api-progress-report-3/ The Summer of Code is now past its half-way mark. For some reason, I passed the mid-term evaluation, and I'm still here. Blame my primary mentor, Adrian, who was responsible for passing me with flying colours. The API is getting ever closer to meeting its success criteria, although not as close as I'd hoped for it to be by this point.

My big excuse for being behind schedule is that I got extremely sidetracked, with my recent work on the Drupal core patch to get a subset of CCK into core (CCK is the Content Construction Kit module, the successor to Flexinode, a tool for allowing administrators to define new structured content types in Drupal). However, this patch is virtually complete now, so it shouldn't be sidetracking me any more.

A very crude XML import and export is now possible. This is a step up from my previous announcements, where I had continually given the bad news that importing was not yet ready at all. You can now import data from an XML file into the database, and stuff will actually happen! But just what you can import is very limited; and if you step outside of that limit, then you're stepping beyond the API's still-constricted boundaries.

The ID and reference handling system - which is set to be one of the API's killer features - is only half-complete at present. I've spent a lot of time today working on the ID generation system, which is an important part of the overall reference handling system, and which is now almost done. This component of the API required a lot of thinking and planning before it happened, as can be seen by the very complicated Boolean decision table that I've documented. This is for working out the various scenarios that need to be handled, and for planning the control logic that determines what actions take place for each scenario.

Unfortunately, as I said, the reference handling system is only half-done. And it's going to stay that way for a while, because I'm away on vacation for the next full week. I hate to just pack up and leave at this critical juncture of development, but hey: I code all year round, and I only get to ski for one week of the year! Anyway, despite being a bit behind schedule, I'm very happy with the quality and the cleanliness of the code thus far (same goes for the documentation, within and outside of the code). And in the Drupal world, the general attitude is that it's better to get something done a bit late, as long as it's done right. I hope that I'm living up to that attitude, and I wish that the rest of the world followed the same mantra.

]]>
Import / Export API: progress report #2 2006-06-26T00:00:00Z 2006-06-26T00:00:00Z Jaza https://greenash.net.au/thoughts/2006/06/import-export-api-progress-report-2/ The mid-program mentor evaluation (a.k.a. crunch time) for the Summer of Code is almost here, and as such, I've been working round-the-clock to get my project looking as presentable as possible. The Import / Export API module has made significant progress since my last report, but there's still plenty of work left to be done.

It gives me great pride to assert that if you download the module right now, you'll find that it actually does something. :-) I know, I know - it doesn't do much (in fact, I may simply be going delusional and crazy from all the coding I've been doing, and it may actually do nothing) - but what it does do is pretty cool. The XML export engine is now functional, which means that you can already use the module to export any entities that currently have a definition available (which is only users and roles, at present), as plain-text XML:

XML export screenshot
XML export screenshot

The import system isn't quite ready to go as yet, but the XML-to-array engine is pretty much done, and with a little more work, the array-to-DB engine will be done as well.

The really exciting stuff, however, has been happening under the hood, in the dark and mysterious depths of the API's code. Alright, alright - exciting if you're the kind of twisted individual that believes recursive array building and refactored function abstraction are hotter than Angelina's Tomb Raiders™. But hey, who doesn't?

Some of the stuff that's been keeping me busy lately:

  • The new Get and Put API has been implemented. All engines now implement either a 'get' or a 'put' operation (e.g. get data from the database, put data into an XML string). All exports and imports are now distinctly broken up into a 'get' and a 'put' component, with the data always being in a standard structured form in-between.
  • Nested arrays are now fully supported, even though I couldn't find any structures in Drupal that require a nested array definition (I wrote a test module that uses nested arrays, just to see that it works).
  • The definition building system has been made more powerful. Fields can now be modified at any time after the definition 'library' is first built. Specific engines can take the original definitions, and build them using their own custom callbacks, for adding and modifying custom attributes.
  • Support for Alternate key fields has been added. This means that you can export unique identifiers that are more human-friendly than database IDs (e.g. user names instead of user IDs), and these fields will automatically get generated whenever their ID equivalents are found in a definition. This will get really cool when the import system is ready - you will be able to import data that only has alternate keys, and not ID keys - and the system will be able to translate them for you.

Also, for those of you that want to get involved in this project, and to offer your feedback and opinions, head on over to the Import / Export API group, which is part of the new groups.drupal.org community site, and which is open for anyone to join. I'd love to hear whatever you may have to say about the module - anything from questions about how it can help you in your quest for the holy grail (sorry, only African Swallows can be exported at this time, support for European Swallows is not yet ready), to complaints about it killing your parrot (be sure that it isn't just resting) - all this and more is welcome.

I hope to have more code, and another report, ready in the near future. Thanks for reading this far!

]]>
Import / Export API: progress report #1 2006-06-06T00:00:00Z 2006-06-06T00:00:00Z Jaza https://greenash.net.au/thoughts/2006/06/import-export-api-progress-report-1/ It's been almost two weeks since the 2006 Summer of Code began, and with it, my work to develop an import / export API module for Drupal. In case you missed it, my work is being documented on this wiki. My latest code is now also available as a project on drupal.org. Since I've barely started, I think that this is a stupid time to sit back and reflect on what I've done so far. But I'm doing it anyway.

Let's start with some excuses. I'm working full-time at the moment, I've got classes on in between, and I just joined the cast of an amateur musical (seriously, what was I thinking?). So due to my current shortage of time, I've decided to focus on documentation for now, which - let's face it - should ideally be done in large quantities before any code is produced, anyway. So I've posted a fair bit of initial documentation on the wiki, including research on existing import / export solutions in Drupal, key features of the new API, and possible problems that will be encountered.

Last weekend, I decided that I was kind of sick of documentation, and that I could ignore the urge to code no longer. Hence, the beginnings of the API are now in place, and are up in Drupal CVS. I will no doubt be returning to documentation over the next few days, in the hope of fattening up my shiny new wiki, which is currently looking rather anorexic.

On a related note: anonymous commenting has been disabled on the wiki, as it was receiving unwelcome comment spam. If you want to post comments, you will HaveToLogin using your name InCamelCase (which is getting on my nerves a bit - but I have to admit that it does the job and does it well).

So far, I've coded the first draft of the data definition for the 'user' entity, and in the process, I've defined-through-experimentation what a data definition will look like in my module. The data definition attributes and structure are currently undocumented, and I see no reason to change that until it all matures a lot more. But ultimately, the plan is to have a reference for it, similar to the Drupal forms API reference.

There are six 'field types' in the current definition system: string (the default), int, float, file, array, and entity. An 'entity' is the top-level field, and is for all practical purposes not a field, but is rather the thing that fields go in. An array is for holding lists of values, and is what will be used for representing 1-M (and even N-M) data within the API. Note to self: support for nested arrays is currently lacking, and is desperately needed.

I have also coded the beginnings of the export engine. This engine is currently capable of taking a data definition, querying the database according to that definition, and providing an array of results, that are structured into the definition (as 'value' fields), and that can then be passed to the rendering part of the engine. The next step is to actually write the rendering part of the engine, and to plug an XML formatter into this engine to begin with. Once that's done, it will be possible to test the essentials of the export process (i.e. database -> array data -> text file data) from beginning to end. I think it's important to show this end-to-end functionality as early on as possible, to prove to myself that I'm on the right track, and to provide real feedback that the system is working. Once this is done, the complexities can be added (e.g. field mapping system, configurable field output).

Overall, what I've coded so far looks very much like a cross between the forms API (with _alter() hooks, recursive array building, and extensible definitions), and the views module (with a powerful definition system, that gets used to build queries). Thank you to the respective authors of both these systems: Adrian / Vertice (one of my mentors); and Earl / Merlinofchaos (who is not my mentor, but who is a mentor, as well as a cool cool coder). Your efforts have provided me with heaps of well-engineered code to copy - er, I mean, emulate! If my project is able to be anywhere near as flexible or as powerful as either of these two systems, I will be very happy.

Thanks also to Adrian and to Sami for the feedback that you've given me so far. I've been in contact with both of my mentors, and both of them have been great in terms of providing advice and guidance.

]]>